
AWS S3 High Availability FTP Server
- Tutorial
Good afternoon, dear readers.
Again I want to share my experience with you. On one of the projects, the goal was to organize an FTP server with increased reliability. By increased reliability is meant the following:
Step One : Install s3fs and mount S3 bucket as a disk partition.
There are not many options, or rather one (if I am mistaken - correct) - s3fs . The s3fs developers on their page claim that "s3fs is stable and is being used in number of production environment . " The process of installing s3fs does not make sense to paint, it is hereI will dwell only on really important points. Firstly, the latest version of s3fs has problems with data synchronization. When you upload a new file to S3, it immediately appears on your server, but if later you make changes to this file on S3, the old version still remains on the server. There is a problem with caching. Attempts to mount the S3 bucket with various options for turning caching on and off failed. After testing various s3fs releases, a version was found where this bug did not manifest itself. Download the package, unpack and install it as written in the Makefile. In order for s3fs to work correctly, make sure that the following packages are already installed on the system:
Step Two : Install pure-ftpd.
It would seem nothing interesting. Simply install using any package manager. However, pure-ftpd is paranoid, and before deleting a file, it first copies it to a new temporary file. And when the file size is several gigabytes, this procedure takes extra time. And in our case, when the data is not stored locally, but on S3, then it’s not at all a short time.
To disable the creation of temporary files before deleting, I rebuilt pure-ftpd with the --without-sendfile option . Of course, it would be more correct to assemble your package and install it in the system, but I did it quickly and did not get distracted.
Step Three : Configure User Rights.
One of the most interesting nuances. According to customer requirements, in each user's home folder there should be directories that the user cannot delete or cannot write to them. If we were dealing with regular disk partitions, then we could just change the owner of the folder. But in our case, this will not work since the rights will be inherited from the option with which the partition is mounted (ro or rw). That is, the user can either read everything or just read it. But pure-ftpd has one useful feature, it can "follow" links. To do this, during assembly, add another option --with-virtualchroot. Thus, we can mount the bucket twice, in read-only and read-write modes and make links to them in the users home directories.
The user directory will look like this:
Now we have given the user read access to the / mnt / mybucketro / folder 1 folder and write access to the / mnt / mybucketrw / folder2 folder . At this stage, we can assume that the first TK item (Data is stored in AWS S3) is completed.
Step Four : Configure High Availability.
Here it was decided to use the good old AWS LoadBalancer and its wonderful HealthCheck.
Open AWS Console and create a new balancer (I am sure that there is no need to repeat the process of creating a balancer. If that, here's a reminder ).
In Ping Protocol, select TCP, Ping Port - 21.
That's it, now the server's viability will be checked by the availability of 21 ports, that is, our FTP server.
We create AMI from our server (on which FTP is already configured and partitions are mounted). Further, as always, we do launch-config with the created AMI and create auto-scaling-group.
When creating an auto-scaling-group, we indicate our new Load Balancer and the --health-check-type ELB option. In this configuration, if our FTP server "crashes", then the Load Balancer will delete it and "raise" a new working server. Since all the data we store on S3, then this procedure will not harm us.
Step Five (optional) Your best practices are highly welcome : Setting up load balancing and autoscaling.
The issue of balancing the load on ftp is far from being solved as easily as, say, the load on the web. I faced this for the first time and, not finding a ready-made free solution, I suggested that the customer balance the load using CSN.
AWS Route53 has an option for A-type entries - weight. The higher the value of the record, the higher its priority at the time of the response to the client.
That is, theoretically, we can create 5 records with the same weight and thus evenly distribute client requests across 5 servers. To automate adding records to AWS Route53, I made two scripts. One to add an entry:
We place both scripts on the server from which we make AMI and add their call to the start-up script. Pure-Ftpd
Now, when starting up, Pure-Ftpd will add a new “A” record with its IP address to AWS Route53, and delete it when it is turned off.
It remains only to add policies for ScaleUP and ScaleDown for our auto-scaling-group.
That's the whole setup. This configuration has been successfully working on the project for six months now.
If you have any questions - write comments, I will answer if possible. I will also be glad if someone shares their experience in organizing such systems.
Again I want to share my experience with you. On one of the projects, the goal was to organize an FTP server with increased reliability. By increased reliability is meant the following:
- Data is stored in AWS S3
- The FTP server itself (Pure-ftpd was selected) should be as accessible as possible
- Organize load balancing (optional)
Step One : Install s3fs and mount S3 bucket as a disk partition.
There are not many options, or rather one (if I am mistaken - correct) - s3fs . The s3fs developers on their page claim that "s3fs is stable and is being used in number of production environment . " The process of installing s3fs does not make sense to paint, it is hereI will dwell only on really important points. Firstly, the latest version of s3fs has problems with data synchronization. When you upload a new file to S3, it immediately appears on your server, but if later you make changes to this file on S3, the old version still remains on the server. There is a problem with caching. Attempts to mount the S3 bucket with various options for turning caching on and off failed. After testing various s3fs releases, a version was found where this bug did not manifest itself. Download the package, unpack and install it as written in the Makefile. In order for s3fs to work correctly, make sure that the following packages are already installed on the system:
- fuse
- fuse-devel
- fuse-libs
#/usr/bin/s3fs mybucket /mnt/mybucket/ -o accessKeyId=XXXXXXXXXXXXX -o secretAccessKey=YYYYYYYYYYYYYYYYY -o allow_other,rw -o readwrite_timeout=120;
Step Two : Install pure-ftpd.
It would seem nothing interesting. Simply install using any package manager. However, pure-ftpd is paranoid, and before deleting a file, it first copies it to a new temporary file. And when the file size is several gigabytes, this procedure takes extra time. And in our case, when the data is not stored locally, but on S3, then it’s not at all a short time.
To disable the creation of temporary files before deleting, I rebuilt pure-ftpd with the --without-sendfile option . Of course, it would be more correct to assemble your package and install it in the system, but I did it quickly and did not get distracted.
Step Three : Configure User Rights.
One of the most interesting nuances. According to customer requirements, in each user's home folder there should be directories that the user cannot delete or cannot write to them. If we were dealing with regular disk partitions, then we could just change the owner of the folder. But in our case, this will not work since the rights will be inherited from the option with which the partition is mounted (ro or rw). That is, the user can either read everything or just read it. But pure-ftpd has one useful feature, it can "follow" links. To do this, during assembly, add another option --with-virtualchroot. Thus, we can mount the bucket twice, in read-only and read-write modes and make links to them in the users home directories.
#/usr/bin/s3fs mybucket /mnt/mybucketrw/ -o accessKeyId=XXXXXXXXXXXXX -o secretAccessKey=YYYYYYYYYYYYYYYYY -o allow_other,rw -o readwrite_timeout=120;
#/usr/bin/s3fs mybucket /mnt/mybucketro/ -o accessKeyId=XXXXXXXXXXXXX -o secretAccessKey=YYYYYYYYYYYYYYYYY -o allow_other,ro -o readwrite_timeout=120;
#mount | grep s3fs
s3fs on /mnt/mybucketro type fuse.s3fs (ro,nosuid,nodev,allow_other)
s3fs on /mnt/mybucketrw type fuse.s3fs (rw,nosuid,nodev,allow_other)
The user directory will look like this:
ls -la /mnt/Users/User1/
.
lrwxrwxrwx 1 root root 15 Mar 25 09:10 mybucketro/folder1 -> /mnt/mybucketro/folder1
lrwxrwxrwx 1 root root 15 Mar 25 09:10 mybucketrw/folder2 -> /mnt/mybucketrw/folder2
Now we have given the user read access to the / mnt / mybucketro / folder 1 folder and write access to the / mnt / mybucketrw / folder2 folder . At this stage, we can assume that the first TK item (Data is stored in AWS S3) is completed.
Step Four : Configure High Availability.
Here it was decided to use the good old AWS LoadBalancer and its wonderful HealthCheck.
Open AWS Console and create a new balancer (I am sure that there is no need to repeat the process of creating a balancer. If that, here's a reminder ).
In Ping Protocol, select TCP, Ping Port - 21.
That's it, now the server's viability will be checked by the availability of 21 ports, that is, our FTP server.
We create AMI from our server (on which FTP is already configured and partitions are mounted). Further, as always, we do launch-config with the created AMI and create auto-scaling-group.
When creating an auto-scaling-group, we indicate our new Load Balancer and the --health-check-type ELB option. In this configuration, if our FTP server "crashes", then the Load Balancer will delete it and "raise" a new working server. Since all the data we store on S3, then this procedure will not harm us.
Step Five (optional) Your best practices are highly welcome : Setting up load balancing and autoscaling.
The issue of balancing the load on ftp is far from being solved as easily as, say, the load on the web. I faced this for the first time and, not finding a ready-made free solution, I suggested that the customer balance the load using CSN.
AWS Route53 has an option for A-type entries - weight. The higher the value of the record, the higher its priority at the time of the response to the client.
That is, theoretically, we can create 5 records with the same weight and thus evenly distribute client requests across 5 servers. To automate adding records to AWS Route53, I made two scripts. One to add an entry:
instance_up.sh
Another to remove:#!/bin/bash
zone_id="Z3KU6XBKO52XV4"
dns_record="example.com."
instance_dns=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-hostname)
instance_ip=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-ipv4)
let number_nodes=$(route53 get $zone_id | grep $dns_record | wc -l)+1
weight="50"
id=$(date "+%Y_%m_%d_%H:%M")
route53 get $zone_id | grep $instance_ip > /dev/null
if [ $? -ne 0 ]; then
route53 get $zone_id | grep $dns_record | awk '{print $4" "$3" "$6" "$7}' | sed 's/id=//' | sed 's/\,//' | sed 's/w=//' | sed 's/)//' | while read i; do
route53 del_record $zone_id $dns_record A $i
route53 add_record $zone_id $dns_record A $(echo $i | awk '{print $1" "$2" "$3}') $weight
done
route53 add_record $zone_id $dns_record A $instance_ip 60 $id $weight
fi
instance_down.sh
The scripts use the route53 utility , which comes with the python-boto package. #!/bin/bash
zone_id="Z3KU6XBKO52XV4"
dns_record="example.com."
instance_dns=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-hostname)
instance_ip=$(/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-ipv4)
let number_nodes=$(route53 get $zone_id | grep $dns_record | wc -l)+1
weight="50"
id=$(date "+%Y_%m_%d_%H:%M")
route53 get $zone_id | grep $instance_ip > /dev/null
if [ $? -eq 0 ]; then
route53 del_record $zone_id $(route53 get $zone_id | grep $instance_ip | awk '{print $1" "$2" "$4" "$3" "$6" "$7}' | sed 's/id=//' | sed 's/\,//' | sed 's/w=//' | sed 's/)//')
fi
We place both scripts on the server from which we make AMI and add their call to the start-up script. Pure-Ftpd
Now, when starting up, Pure-Ftpd will add a new “A” record with its IP address to AWS Route53, and delete it when it is turned off.
It remains only to add policies for ScaleUP and ScaleDown for our auto-scaling-group.
That's the whole setup. This configuration has been successfully working on the project for six months now.
If you have any questions - write comments, I will answer if possible. I will also be glad if someone shares their experience in organizing such systems.