3CX Failover VoIP Cluster

3CX Failover Cluster consists of two replicated PBX servers. When the main server fails, the replica server is turned on, minimizing the telephony failure time. In this article, we will look at how to properly configure the 3CX PBX resiliency.

Licensing


To use fault tolerance, you need one Enterprise (ENT) or Professional (PRO) license. The ENT license sets the lifetime (TTL) of the 3QX server's FQDN A record to 5 minutes. In the PRO TTL license, A-records are set at 6 o’clock. This means that in the PRO edition, the time for emergency reconnection of IP phones, 3CX, 3CX SBC clients and the web client will be significantly higher.

Failover Implementation


3CX uses the active-passive cluster principle with configuration replication once every 24 hours. The primary (active) node performs VoIP call processing, and the backup (passive) node monitors the active host. If the active host fails (regardless of the reason), the passive host starts working from approximately the same state. The mechanism for determining the failure of an active host depends on the settings on the passive host and is discussed below.

Supported Network Topologies


The 3CX failover cluster is designed to operate in the following topologies:

  • Local Server (NAT)
  • Cloud (public server)


Failover between the on-premises and cloud hosts is not officially supported. It is also assumed that the 3QX server FQDN is provided and maintained by 3CX. Of course, you can use a more complex topology and your own FQDN name, but in this case, managing DNS records and reconfiguring devices is the responsibility of the system administrator.
 

Prerequisites


Before starting a failover cluster on 2 servers, the 3CX server must be installed as follows:

  • One 3CX in the cloud (first public IP) and another PBX in the cloud (second public IP) with FQDN name from 3CX
  • Both servers are installed with identical settings - the same FQDN, SSL certificate and SIP, tunnel and web server ports.
  • In the auto-settings tab of IP phones, you must specify the interface as FQDN, and not as an IP address (see below).

In other topologies, for example, one node in a private network and the other in the cloud, you must use scripts to update the DNS A records. Scripts must be run at the time of the failover. Below we will talk about this in more detail.

Configure the primary node


Assume that 3CX v15.5 is already installed and configured on the main (active) server.



For all extension numbers, specify the option to auto-configure IP phones via FQDN (Auto-configure phone tab, Select interface option).


In the Backup section, click the Location button and select Google Drive. You can choose a different location available from both servers. In our example, the cluster stores the configuration in the SIP3CXCOMBackups folder on Google Drive.



Click the Backup Plan button and select the data that you want to synchronize in the cluster, as well as the synchronization time. It is recommended that you set up daily night synchronization as shown above. The name of the 3CXScheduledBackup.zip backup file is redirected to 3CX for the last downloaded configuration and is used by both nodes of the cluster.



In the same section, click the Fault tolerance button, enable the Enable backup option and select the Backup switching mode - Basic.
This completes the configuration of the active cluster node.

Configure a backup node


Install 3CX on the standby server, taking into account the prerequisites listed above. Please note that if your primary and backup servers use different public IP addresses, after installing the backup server, the common FQDN of the cluster name will resolve to the backup node (and first, after installing the primary server, it will resolve to its IP). In order to associate the FQDN again with the IP address of the main server, on the main server in the Main section, click the License link, and then click Change and OK. Now the FQDN will again begin to point to the IP of the primary server.
 

Go to the Backup section and click the Recovery plan button. Enable recovery and specify the recovery time (of course, it should be a bit later than the backup time). Then set the Do not start services after recovery option.


In the same section, click the Fault tolerance button, enable the Enable backup option and select Backup switching mode - Backup. Specify the IP address of the main server (in our example 1.1.1.1) and the services that should be monitored: SIP Server, Web Server or Tunnel Server. Set the check interval and the switching operation logic - when only one service or all services “fall”.

If the backup server detects a failure of the primary, it is included in the work using the data of the last restored backup. In addition, it notifies the DNS 3CX (which is located on the Google infrastructure) of a change in the AQ record of the FQDN to the IP address of the backup node.

It is important that the “fallen” primary server be completely turned off, because if 3CX services are still running on it, a conflict with similar services on the backup node is possible.

It is worth noting that 3CX does not officially support the operation of FXO or FXS gateways in a failover cluster, since the interaction of gateways and server in such a topology depends on the characteristics of the vendor, the specific model and version of the firmware.

Scripts in complex topology


If you use your own FQDN in the LAN-LAN topology or LAN-Cloud topology, you must use special scripts (in Windows these are Powershell fault tolerance scripts for Active Directory ) that run with certain privileges.
By default, scripts that can be executed during the failover process are executed with the privileges of the 3CX Event Notification Manager service (by default, Local System). As a rule, to run a script, you need the privileges of managing the DNS server (dnscmd or psexec).



Click on the service in the appropriate Windows snap-in and on the Log On tab, change the account from Local System to the one that has the appropriate privileges for configuring DNS, and restart the service. It is recommended that you create a separate user with this privilege and give him an immutable password.

Also popular now: