High website availability: geo-replicating site files with lsyncd
High availability of the website - a collaboration of the hosting provider and website developer. The primary goal of high availability is to minimize planned and unplanned downtime.
High availability is more than just placing your project in a secure cloud. A truly highly accessible site should work in several cloud regions and its users should not notice any changes even if one of the cloud regions becomes inaccessible. The website developer must ensure that the website is operational even in the event of an emergency. High availability systems are duplicated: if the provider fails, the site will be available. If user replication fails, the site must also be accessible. If it is necessary to carry out work on the server to the developer or restart it, users should not notice this.
In this series of articles we will look at ways to organize high availability of various subsystems of your site. Many tasks have different solutions. The author does not claim that the best solution is presented here, but it is quite functional and tested in practice. However, the field for accessibility experiments is enormous.
Today we look at the synchronization of a static site between regions of the cloud: changes in files on one of the servers should appear on the other. We will also consider the simplest way to redirect users of your site to an alternative server using several DNS A-records, applicable for this case.
Lsyncd (Live Syncing Daemon) is an application for timely interactive mirroring of server data for use in high availability clusters. Lsyncd is particularly well suited for systems with low sync traffic. The application collects information about data changes through the Linux inotify kernel subsystem for a period defined in the configuration and starts mirroring changes (via rsync by default, but there are other options). By default, lsyncd starts as a daemon in the background and logs its actions using syslog . For testing purposes, you can run the application without demonization to see what is happening in the terminal for debugging.
Lsyncd does not use a separate file system or block device and does not significantly affect the performance of the local file system.
Using the Rsync + SSH option allows you to transfer files directly to the destination directory instead of transferring the location to a remote server.
Order 2 InfoboxCloud subscriptions in Moscow and Amsterdam to create a geo-distributed solution.
In order for the subscriptions to be tied to a single user account, you need to do this:
1. Go to http://infoboxcloud.ru and order a cloud infrastructure in any region (for example, Amsterdam). Next, go to the control panel and order a cloud in another region (for example, in Moscow), as shown below.
After ordering, exit the control panel and log in again. Now you can choose the region in which the work takes place in the upper right corner of the control panel:
Create 2 servers: one in Moscow, the other in Amsterdam.
As the operating system, select CentOS 7. This article describes it, but you can use another Linux operating system if necessary. However, the settings may vary. You can use any type of virtualization to choose from. The difference for a specific scenario is that if you do not check “enable OS kernel management”, you can use memory autoscaling for servers, which will allow you to use resources more efficiently. And if you put it , you can configure the inotify kernel subsystem, which will be useful for high loads ( example setup), but it does not make sense for a regular small site. When creating each server, be sure to add one public ip – address so that the servers can be accessed from the external network:
After creating the servers, the access data will be sent to you by email.
For the primary domain, the site on which you want to be highly accessible, create two DNS A-records pointing to a server in Moscow and a server in Amsterdam. In our case, the site will failover.trukhin.com .
Create service subdomains whose A record should point to your server. For example, failovermsk.trukhin.com points to a server in Moscow , and failoverams.trukhin.com points to a server in Amsterdam . Separate subdomains for each server are needed so that if the server fails, deploy another replica from the backup server and redirect the subdomain to it.
The following steps must be performed on both servers.
Connect via SSH to both servers. Install Apache on each of the servers, start it and add it to autoload:
Create the index.html file in the / var / www / html directory of each of the servers and make sure that the page opens correctly in the browser from each of the servers.
For lsyncd to work, access to each of the servers must be granted by key for each other.
Generate the SSH key (you can simply press Enter for questions):
From the server in Moscow, add the key to the server in Amsterdam:
From the server in Amsterdam, add the key to the server in Moscow:
Now connect from a server in Moscow (root@failovermsk.trukhin.com) to a server in Amsterdam (root@failoverams.trukhin.com) and vice versa. Password should not be requested. When connecting, answer yes.
The following steps must be performed on both servers.
To install lsyncd on CentOS 7, add the EPEL repository with the command:
Now install lsyncd:
Create a directory to store logs and temporary lsyncd files:
Create the lsyncd configuration file at /etc/lsyncd.conf
Replace host: failovermsk.trukhin.com with a subdomain directed only to a server in another region. The source indicates the folder on the current server. The targetdir specifies the folder on the remote server. The delay parameter is the period through which synchronization of changes on the server will be performed. This value is selected experimentally, the default value is 10. All lsyncd parameters can be found in the official documentation .
For debugging, set the parameter nodaemon = true and save the changes. Create a file on one of the servers :
Run lsyncd manually to verify that everything is syncing correctly.
If everything went well - on a server in another region in the / var / www / html folder you will see the created test file.
Now return nodaemon = false to /etc/lsyncd.conf. Add lsyncd to startup and start the service:
Make sure the data is replicated after the reboot.
For replication to work in the opposite direction - on a server in another region, make the same settings, but in the lsyncd configuration file, specify the address of the first server. Verify that the data is replicated backwards. The lsyncd configuration already specifies the temporary directory temp_dir , the use of which is necessary for two-way synchronization.
2-way replication is not always necessary, since in mysql it is not recommended to use master-master replication, and if the first server fails when using such a database, you will have to configure replication from the second working server to the third. This is done simply if we prepare a backup server template for the cloud in advance, which will be discussed in subsequent articles. Today we work only with files, and for a static site, two-way replication is quite applicable.
Let's go to our website:
Both servers are available.
Next, turn off the server in Moscow.
Our site is available:
Turn on the server in Moscow and turn off in Amsterdam:
Our site is available:
In the presence of several A-records, modern browsers first try to go to one ip-address, and if it is not available, to another. Thus, if at least one server is available, the site will work.
This approach in a more complex form has long been used by large sites and a lot of backup servers have been made for them. For example, Google has 11 of them:
In this article, we looked at how to configure geo-replication for a static site without a database. In subsequent articles, we will look at how to replicate a database and provide high availability for more complex sites.
If you find a mistake in the article, the author will gladly correct it. Please write to the PM or e-mail about it. If you cannot leave comments on Habré - write them in InfoboxCloud Community .
Successful work!
High availability is more than just placing your project in a secure cloud. A truly highly accessible site should work in several cloud regions and its users should not notice any changes even if one of the cloud regions becomes inaccessible. The website developer must ensure that the website is operational even in the event of an emergency. High availability systems are duplicated: if the provider fails, the site will be available. If user replication fails, the site must also be accessible. If it is necessary to carry out work on the server to the developer or restart it, users should not notice this.
In this series of articles we will look at ways to organize high availability of various subsystems of your site. Many tasks have different solutions. The author does not claim that the best solution is presented here, but it is quite functional and tested in practice. However, the field for accessibility experiments is enormous.
Today we look at the synchronization of a static site between regions of the cloud: changes in files on one of the servers should appear on the other. We will also consider the simplest way to redirect users of your site to an alternative server using several DNS A-records, applicable for this case.
Lsyncd
Lsyncd (Live Syncing Daemon) is an application for timely interactive mirroring of server data for use in high availability clusters. Lsyncd is particularly well suited for systems with low sync traffic. The application collects information about data changes through the Linux inotify kernel subsystem for a period defined in the configuration and starts mirroring changes (via rsync by default, but there are other options). By default, lsyncd starts as a daemon in the background and logs its actions using syslog . For testing purposes, you can run the application without demonization to see what is happening in the terminal for debugging.
Lsyncd does not use a separate file system or block device and does not significantly affect the performance of the local file system.
Using the Rsync + SSH option allows you to transfer files directly to the destination directory instead of transferring the location to a remote server.
Installing and configuring geo-replication of web server files
Gaining access to different regions of InfoboxCloud
Order 2 InfoboxCloud subscriptions in Moscow and Amsterdam to create a geo-distributed solution.
In order for the subscriptions to be tied to a single user account, you need to do this:
1. Go to http://infoboxcloud.ru and order a cloud infrastructure in any region (for example, Amsterdam). Next, go to the control panel and order a cloud in another region (for example, in Moscow), as shown below.
After ordering, exit the control panel and log in again. Now you can choose the region in which the work takes place in the upper right corner of the control panel:
Create 2 servers: one in Moscow, the other in Amsterdam.
As the operating system, select CentOS 7. This article describes it, but you can use another Linux operating system if necessary. However, the settings may vary. You can use any type of virtualization to choose from. The difference for a specific scenario is that if you do not check “enable OS kernel management”, you can use memory autoscaling for servers, which will allow you to use resources more efficiently. And if you put it , you can configure the inotify kernel subsystem, which will be useful for high loads ( example setup), but it does not make sense for a regular small site. When creating each server, be sure to add one public ip – address so that the servers can be accessed from the external network:
After creating the servers, the access data will be sent to you by email.
DNS setup
For the primary domain, the site on which you want to be highly accessible, create two DNS A-records pointing to a server in Moscow and a server in Amsterdam. In our case, the site will failover.trukhin.com .
Create service subdomains whose A record should point to your server. For example, failovermsk.trukhin.com points to a server in Moscow , and failoverams.trukhin.com points to a server in Amsterdam . Separate subdomains for each server are needed so that if the server fails, deploy another replica from the backup server and redirect the subdomain to it.
Server Configuration
The following steps must be performed on both servers.
Connect via SSH to both servers. Install Apache on each of the servers, start it and add it to autoload:
yum -y update && yum install -y httpd && systemctl start httpd.service && systemctl enable httpd.service
Create the index.html file in the / var / www / html directory of each of the servers and make sure that the page opens correctly in the browser from each of the servers.
Hi
Hello, World!
For lsyncd to work, access to each of the servers must be granted by key for each other.
Generate the SSH key (you can simply press Enter for questions):
ssh-keygen
From the server in Moscow, add the key to the server in Amsterdam:
ssh-copy-id root@failoverams.trukhin.com
From the server in Amsterdam, add the key to the server in Moscow:
ssh-copy-id root@failovermsk.trukhin.com
Now connect from a server in Moscow (root@failovermsk.trukhin.com) to a server in Amsterdam (root@failoverams.trukhin.com) and vice versa. Password should not be requested. When connecting, answer yes.
Install and configure lsyncd
The following steps must be performed on both servers.
To install lsyncd on CentOS 7, add the EPEL repository with the command:
rpm -ivh http://mirror.yandex.ru/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
Now install lsyncd:
yum install lsyncd
Create a directory to store logs and temporary lsyncd files:
mkdir -p /var/log/lsyncd && mkdir -p /var/www/temp
Create the lsyncd configuration file at /etc/lsyncd.conf
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status",
nodaemon = false
}
sync {
default.rsyncssh,
source="/var/www/html",
host="failovermsk.trukhin.com",
targetdir="/var/www/html",
rsync = {
archive=true,
compress=true,
temp_dir="/var/www/temp",
update=true,
links=true,
times=true,
protect_args=true
},
delay=3,
ssh = {
port = 22
}
}
Replace host: failovermsk.trukhin.com with a subdomain directed only to a server in another region. The source indicates the folder on the current server. The targetdir specifies the folder on the remote server. The delay parameter is the period through which synchronization of changes on the server will be performed. This value is selected experimentally, the default value is 10. All lsyncd parameters can be found in the official documentation .
For debugging, set the parameter nodaemon = true and save the changes. Create a file on one of the servers :
touch /var/www/test
Run lsyncd manually to verify that everything is syncing correctly.
lsyncd /etc/lsyncd.conf
If everything went well - on a server in another region in the / var / www / html folder you will see the created test file.
Now return nodaemon = false to /etc/lsyncd.conf. Add lsyncd to startup and start the service:
systemctl start lsyncd.service
systemctl enable lsyncd.service
Make sure the data is replicated after the reboot.
2-way replication
For replication to work in the opposite direction - on a server in another region, make the same settings, but in the lsyncd configuration file, specify the address of the first server. Verify that the data is replicated backwards. The lsyncd configuration already specifies the temporary directory temp_dir , the use of which is necessary for two-way synchronization.
2-way replication is not always necessary, since in mysql it is not recommended to use master-master replication, and if the first server fails when using such a database, you will have to configure replication from the second working server to the third. This is done simply if we prepare a backup server template for the cloud in advance, which will be discussed in subsequent articles. Today we work only with files, and for a static site, two-way replication is quite applicable.
Check site availability
Let's go to our website:
Both servers are available.
Next, turn off the server in Moscow.
Our site is available:
Turn on the server in Moscow and turn off in Amsterdam:
Our site is available:
Why does it work?
In the presence of several A-records, modern browsers first try to go to one ip-address, and if it is not available, to another. Thus, if at least one server is available, the site will work.
This approach in a more complex form has long been used by large sites and a lot of backup servers have been made for them. For example, Google has 11 of them:
Conclusion
In this article, we looked at how to configure geo-replication for a static site without a database. In subsequent articles, we will look at how to replicate a database and provide high availability for more complex sites.
If you find a mistake in the article, the author will gladly correct it. Please write to the PM or e-mail about it. If you cannot leave comments on Habré - write them in InfoboxCloud Community .
Successful work!