Organization of a backup server. Linux, ZFS and rsync
TL; DR:
An article about configuring backup Linux servers. The storage uses the ZFS partition with deduplication and compression enabled. Snapshots are made daily, which are stored for a week (7 pieces). Monthly snapshots are stored throughout the year (12 more pieces). Rsync acts as a transport: on the server it is launched by the daemon, on clients it is launched from crontab.
It so happened that I have a couple of servers on which virtual machines live under KVM. I wanted to backup the images of these machines to the network, but so that the conditions were met:
Can all this be combined? Yes, and very simple.
All the computers in this article are servers. But it’s kind of silly and long to divide them into “a server that stores backups” and “a server whose backups are stored by a server that stores backups”. Therefore, the first I will call simply the server, and the second has already begun to call the client.
The most familiar OS for me is Linux. All the same, without any changes, should apply to both Solaris and FreeBSD, in which ZFS has been around for a long time and what is called “out of the box”. But Linux is closer and dearer to me, and the project for porting ZFS to it looks already quite mature. For a year of experiments, I had no noticeable problems with him. Therefore, I installed Debian Wheezy on the server, connected the official project repository and installed the necessary packages .
I created a pool, indicating that I will have zfs on / dev / md1 and that I want to mount this file system to the / mnt / backup directory:
By the name of the device / dev / md1, you can notice that I am using Linux software raid. Yes, I know that ZFS has its own way of creating mirrors. But since this machine already has one mirror (for the root partition) and it is made by regular mdadm, then for the second mirror I would prefer to use it as well.
Included deduplication and compression, made the directory with snapshots visible:
Put the script for creating snapshots in / usr / local / bin:
This script was added to crontab for daily launch. In order for the contents of the snapshot to correspond to its date, it is better to run the script near the end of the day. For example, at 23:55.
The fourth day of the month is chosen almost by accident. I started it all on the third of August and I wanted to quickly backup, which will be stored for a year. The next day was the fourth.
Snapshots will be saved in the /mnt/backup/.zfs/snapshot directory. Each snapshot is a separate directory with a name in the form of a date at the time this snapshot was created. Inside the snapshot, a full copy of the / mnt / backup directory in the form in which it was at that moment.
Traditionally, rsync is configured to work on top of ssh. On clients, authorization by keys is configured (and without a password), and these keys are added to the backup server. The server goes via ssh to clients and takes files from them. The advantage of this approach is traffic encryption. But I do not like the idea of passwordless login via ssh (especially in light of the latest vulnerabilities in bash). Also, I don’t like the idea of initiating backup from the server side: sometimes before the backup on the client I want to execute some kind of script (for example, dump the mysql dump), and only after this script has finished starting backup. Therefore, my choice is rsync, launched by the daemon on the server and launched from crontab on the clients.
I installed rsync on the server (regular, from the repository), and so that it would be launched at system startup, I wrote in / etc / default / rsync:
Created the following on the server /etc/rsyncd.conf:
192.168.xxx.xxx and 192.168.xxx.yyy are the addresses of those servers that will be backed up. Their names are kvm01 and kvm02. Their files will be in / mnt / backup / kvm01 and / mnt / backup / kvm02. Therefore:
Launched rsync:
The minimum required script to copy files from the kvm02 client to the server with the address 192.168.xxx.zzz will look something like this:
Of course, if we are talking about backup of virtual machines, then this script should be replenished with commands for creating and removing LVM snapshots, mounting and unmounting its contents, and so on. But this topic is already beyond the scope of this article.
To restore files from the backup of the KVM01 client for August 4, 2014, it will be enough on the server to go to the /mnt/backup/.zfs/snapshot/2014-08-04/kvm01/ directory and copy the files from there in any usual way. Each specific backup looks like a regular read-only directory. To search for a specific file in this backup, you can use standard utilities such as find or grep.
Now the server has 9 snapshots: 7 daily and 2 monthly. Plus today's backup, which snapshot will be removed in the evening. The size of the partition with backups is 1.8T. The total file size is 3.06T. They are physically occupied on the disk by 318G. The total volume of today's backup is 319G. Yes, 10 backups on ZFS with compression and deduplication take up less space than one backup would take on a file system without these useful properties.
Since rsync itself is not involved in encrypting transmitted data, it is unsafe to install such a scheme without changes to the Internet. You can add encryption by letting traffic through ipsec or stunnel, for example.
I wrote above that I had no noticeable problems with ZFS. In fact, one problem was. One night, when both clients were actively backing up, the server informed dmesg twice that task rsync blocked for more than 120 seconds. At the same time, both backups completed successfully, nothing hung, data was not lost. I suspect that this is a manifestation of the famous bug 12309 . Spread backups in time, since then the problem has not recurred.
An article about configuring backup Linux servers. The storage uses the ZFS partition with deduplication and compression enabled. Snapshots are made daily, which are stored for a week (7 pieces). Monthly snapshots are stored throughout the year (12 more pieces). Rsync acts as a transport: on the server it is launched by the daemon, on clients it is launched from crontab.
It so happened that I have a couple of servers on which virtual machines live under KVM. I wanted to backup the images of these machines to the network, but so that the conditions were met:
- Keep all backups for the last week.
- Store monthly backups throughout the year.
- No third-party backup agents. On clients only standard and tested by generations of admins software.
- Save space in storage. Compression and data deduplication are desirable.
- All files should be accessible without additional tools and shells. Ideal option: each backup in a separate directory.
Can all this be combined? Yes, and very simple.
All the computers in this article are servers. But it’s kind of silly and long to divide them into “a server that stores backups” and “a server whose backups are stored by a server that stores backups”. Therefore, the first I will call simply the server, and the second has already begun to call the client.
1. ZFS with compression and deduplication
The most familiar OS for me is Linux. All the same, without any changes, should apply to both Solaris and FreeBSD, in which ZFS has been around for a long time and what is called “out of the box”. But Linux is closer and dearer to me, and the project for porting ZFS to it looks already quite mature. For a year of experiments, I had no noticeable problems with him. Therefore, I installed Debian Wheezy on the server, connected the official project repository and installed the necessary packages .
I created a pool, indicating that I will have zfs on / dev / md1 and that I want to mount this file system to the / mnt / backup directory:
# zpool create backup -m /mnt/backup /dev/md1
By the name of the device / dev / md1, you can notice that I am using Linux software raid. Yes, I know that ZFS has its own way of creating mirrors. But since this machine already has one mirror (for the root partition) and it is made by regular mdadm, then for the second mirror I would prefer to use it as well.
Included deduplication and compression, made the directory with snapshots visible:
# zfs set dedup=on backup
# zfs set compression=on backup
# zfs set snapdir=visible backup
Put the script for creating snapshots in / usr / local / bin:
#!/bin/bash
export LANG=C
ZPOOL='backup'
# Храним все снапшоты 7 дней
# снапшот на четвертое число каждого месяца храним год
NOWDATE=`date +20%g-%m-%d` # дата формата ГГГГ-ММ-ДД
OLDDAY=`date -d -7days +%e`
if [ $OLDDAY -eq '4' ]
then
OLDDATE=`date -d -1year-7days +20%g-%m-%d` # получаем дату -1 год и на7 дней
else
OLDDATE=`date -d -7days +20%g-%m-%d` # получаем дату -7 дней
fi
/sbin/zfs snapshot $ZPOOL@$NOWDATE
/sbin/zfs destroy $ZPOOL@$OLDDATE 2>/dev/null
This script was added to crontab for daily launch. In order for the contents of the snapshot to correspond to its date, it is better to run the script near the end of the day. For example, at 23:55.
The fourth day of the month is chosen almost by accident. I started it all on the third of August and I wanted to quickly backup, which will be stored for a year. The next day was the fourth.
Snapshots will be saved in the /mnt/backup/.zfs/snapshot directory. Each snapshot is a separate directory with a name in the form of a date at the time this snapshot was created. Inside the snapshot, a full copy of the / mnt / backup directory in the form in which it was at that moment.
2. Rsync on the server
Traditionally, rsync is configured to work on top of ssh. On clients, authorization by keys is configured (and without a password), and these keys are added to the backup server. The server goes via ssh to clients and takes files from them. The advantage of this approach is traffic encryption. But I do not like the idea of passwordless login via ssh (especially in light of the latest vulnerabilities in bash). Also, I don’t like the idea of initiating backup from the server side: sometimes before the backup on the client I want to execute some kind of script (for example, dump the mysql dump), and only after this script has finished starting backup. Therefore, my choice is rsync, launched by the daemon on the server and launched from crontab on the clients.
I installed rsync on the server (regular, from the repository), and so that it would be launched at system startup, I wrote in / etc / default / rsync:
RSYNC_ENABLE=true
Created the following on the server /etc/rsyncd.conf:
uid = nobody
gid = nogroup
use chroot = yes
max connections = 10
pid file = /var/run/rsyncd.pid
[kvm01]
path = /mnt/backup/kvm01
comment = KVM01 backups
hosts allow = 192.168.xxx.xxx
hosts deny = *
read only = no
[kvm02]
path = /mnt/backup/kvm02
comment = KVM02 backups
hosts allow = 192.168.xxx.yyy
hosts deny = *
read only = no
192.168.xxx.xxx and 192.168.xxx.yyy are the addresses of those servers that will be backed up. Their names are kvm01 and kvm02. Their files will be in / mnt / backup / kvm01 and / mnt / backup / kvm02. Therefore:
# mkdir /mnt/backup/kvm01
# mkdir /mnt/backup/kvm02
# chown nobody:nogroup /mnt/backup/kvm01
# chown nobody:nogroup /mnt/backup/kvm02
Launched rsync:
# /etc/init.d/rsync start
3. Rsync on clients
The minimum required script to copy files from the kvm02 client to the server with the address 192.168.xxx.zzz will look something like this:
#!/bin/bash
RSYNCBACKUPDIR="rsync://192.168.xxx.zzz/kvm02"
LOCALDIR="/virt/files"
rsync -vrlptD --delete $LOCALDIR $RSYNCBACKUPDIR
Of course, if we are talking about backup of virtual machines, then this script should be replenished with commands for creating and removing LVM snapshots, mounting and unmounting its contents, and so on. But this topic is already beyond the scope of this article.
4. Recovery
To restore files from the backup of the KVM01 client for August 4, 2014, it will be enough on the server to go to the /mnt/backup/.zfs/snapshot/2014-08-04/kvm01/ directory and copy the files from there in any usual way. Each specific backup looks like a regular read-only directory. To search for a specific file in this backup, you can use standard utilities such as find or grep.
5. Conclusion
Now the server has 9 snapshots: 7 daily and 2 monthly. Plus today's backup, which snapshot will be removed in the evening. The size of the partition with backups is 1.8T. The total file size is 3.06T. They are physically occupied on the disk by 318G. The total volume of today's backup is 319G. Yes, 10 backups on ZFS with compression and deduplication take up less space than one backup would take on a file system without these useful properties.
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
backup 1.80T 310G 1.49T 16% 10.37x ONLINE -
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 3.06T 1.42T 318G /mnt/backup
Since rsync itself is not involved in encrypting transmitted data, it is unsafe to install such a scheme without changes to the Internet. You can add encryption by letting traffic through ipsec or stunnel, for example.
I wrote above that I had no noticeable problems with ZFS. In fact, one problem was. One night, when both clients were actively backing up, the server informed dmesg twice that task rsync blocked for more than 120 seconds. At the same time, both backups completed successfully, nothing hung, data was not lost. I suspect that this is a manifestation of the famous bug 12309 . Spread backups in time, since then the problem has not recurred.