Virtual Check Points: setup checklist

Many customers who rent cloud resources from us use virtual Check Points. With their help, clients solve various problems: someone controls the access of the server segment to the Internet or publishes their services for our equipment. Someone needs to run all traffic through the IPS blade, while someone needs Check Point as a VPN gateway to access internal resources in the data center from branches. There are those who need to protect their infrastructure in the cloud to pass certification according to FZ-152, but I’ll tell you about this separately.
On duty, I am involved in supporting and administering Check Points. Today I’ll tell you what to consider when deploying a cluster of Check Points in a virtual environment. I will touch upon moments of the level of virtualization, network, settings of Check Point itself and monitoring.
I do not promise to discover America - much is in the recommendations and best practices of the vendor. But no one reads them), so they drove.
Cluster mode
We have Check Points live in clusters. The most common installation is a cluster of two nodes in active-standby mode. If something happens to the active node, it becomes inactive, and the standby node is turned on. Switching to a “spare” node usually occurs due to problems in synchronization between cluster members, the state of the interfaces, the established security policy, simply because of the heavy load on the equipment.
In a two-node cluster, we do not use active-active mode.
If one of the nodes falls, the surviving node may simply not be able to withstand the double load, and then we will lose everything. If you really want active-active, then the cluster should have at least 3 nodes.
Network and Virtualization Settings
Multicast traffic between the SYNC interfaces of cluster members is allowed on network equipment. If multicast traffic is not possible, then the synchronization protocol (CCP) is used broadcast. The nodes in the Check Point cluster synchronize with each other. Messages about changes are transmitted from node to node through multicast. Check Point uses a non-standard multicast implementation (a non-multicast IP address is used). Because of this, some equipment, such as the Cisco Nexus switch, does not understand these messages and therefore blocks them. In this case, switch to broadcast.

Description of the problem with Cisco Nexus and its solutions on the vendor portal.
At the virtualization level, we also allow the passage of multicast traffic. If multicast is prohibited for cluster synchronization (CCP), then use broadcast.
In the Check Point console, using the cphaprob -a if command, you can see the CPP settings and its operating mode (multicast or broadcast). To change the operating mode, use the cphaconf set_ccp broadcast command.

Cluster nodes must be on different ESXi hosts. Everything is clear here: when the physical host falls, the second node continues to work. This can be achieved using DRS anti-affinity rules.
Dimensions of the virtual machine on which Check Point will run.The vendor's recommendations are 2 vCPUs and 6 GB, but this is for a minimal configuration, for example, if you have a firewall with minimal bandwidth. In our implementation experience, when using multiple software blades, it is advisable to use at least 4 vCPUs, 8 GB RAM.
On a node, we allocate an average of 150 GB of disk. When deploying a virtual Check Point, the disk is partitioned, and we can adjust how much space is allocated for System Swap, System Root, Logs, Backup and Upgrade.
When you increase System Root, the Backup and Upgrade partition also needs to be increased in order to keep the proportion between them. If the proportion is not respected, then the next backup may not fit the disk.
Disk Provisioning - Thick Provision Lazy Zeroed.Check Point generates a lot of events and logs, every second 1000 entries appear. Under them, it is better to reserve a place immediately. To do this, when creating a virtual machine, we allocate a disk for it using the Thick Provisioning technology, i.e. we reserve space on physical storage at the time of disk creation.
Configured 100% resource reservation for Check Point during migration between ESXi hosts. We recommend that you reserve 100% of the resources so that the virtual machine on which Check Point is deployed does not compete for resources with other VMs on the host.
Misc. We use the Check Point version of R77.30. For it, it is recommended to use RedHat Enterprise Linux version 5 (64-bit) as a guest OS on a virtual machine. From network drivers - VMXNET3 or Intel E1000.
Check Point Settings
The latest Check Point updates are installed on the gateways and the management server. Check for updates via CPUSE.

Using Verifier, we verify that the service pack we are about to install does not conflict with the system.


Verifier, of course, is a good thing, but there are nuances. Some updates are not compatible with add-on, but Verifier will not show these conflicts and will allow updating. At the end of the update, you will get an error, and only from it you will find out what prevents the update. For example, this situation occurred with the MABDA_001 service pack (Mobile Access Blade Deployment Agent), which solves the problem of launching Java Plugin in browsers other than IE.
Configured daily automatic signature updates for IPS and other software blades.Check Point releases signatures that can be used to detect or block new vulnerabilities. Vulnerabilities are automatically assigned a criticality level. In accordance with this level and the set filter, the system decides whether to detect or block the signature. It is important here not to overdo it with the filters, periodically check and make adjustments so that legitimate traffic is not blocked.

IPS profile, where we select the action with respect to the signature in accordance with its parameters.

The policy settings for this IPS profile are in accordance with the signature settings: severity level, performance impact, etc.
The Check Point hardware is configured with the NTP time synchronization protocol. According to recommendations, Check Point should use an external NTP server to synchronize time on the equipment. This can be done through the gaia web portal.
Inaccurately set time can cause the cluster to out of sync. If the time is wrong, then it is extremely inconvenient to look for the log entry that interests us. Each entry in the event logs is marked with a so-called timestamp.


Configured Smart Event for alerts about IPS, App Control, Anti-Bot, etc. This is a separate module with its own license. If you have one, then using it is convenient to visualize information about the operation of all software blades and devices. For example, attacks, the number of IPS operations, the criticality level of threats, which prohibited applications are used by users, etc.


These are statistics for 30 days according to the number of signatures and the degree of their criticality.

More detailed information on the detected signatures on each software blade.
Monitoring
It is important to monitor at least the following parameters:
- cluster state;
- availability of Check Point components;
- CPU load
- remaining disk space;
- free memory.
Check Point has a separate software blade - Smart Monitoring (separate license). In it, you can additionally monitor the availability of Check Point components, loads on individual blades, and license statuses.


Chek Point load graph. Splash - this is the customer who sent push notifications to 800 thousand customers.

The graph of the load on the Firewall blade in the same situation.
Monitoring can also be configured through third-party services. For example, we also use Nagios, where we monitor:
- network availability of equipment;
- Cluster address availability
- CPU loading by cores. When downloading more than 70%, an email alert arrives. Such a high load may indicate specific traffic (vpn, for example). If this is often repeated, then perhaps there are not enough resources and it is worth expanding the pool.
- free RAM. If less than 80% remains, then we will find out about it.
- disk loading on certain partitions, for example var / log. If it soon clogs, then it is necessary to expand.
- Split Brain (at the cluster level). We monitor the state when both nodes become active and synchronization between them disappears.
- High availability mode - we monitor that the cluster is in active-standby mode. We look at the states of the nodes - active, standby, down.

Monitoring options in Nagios.
It is also worth monitoring the status of physical servers on which ESXi hosts are deployed.
Backup
The vendor himself recommends taking a snapshot immediately after installing the update (Hotfixies).
Depending on the frequency of changes, a full backup is configured once a week or a month. In our practice, we do daily incremental copying of Check Point files and a full backup once a week.
That's all. These were the most basic points to consider when deploying virtual Check Points. But even meeting this minimum will help to avoid problems with their work.