Comparison of backup methods
Preparing a new server for work should begin with setting up a backup. Everyone, it would seem, knows about this - but sometimes even experienced system administrators make unforgivable mistakes. And the point here is not only that the task of setting up a new server needs to be solved very quickly, but also that it is far from always clear which backup method to use.
Of course, it is impossible to create an ideal way that would suit everyone: everywhere there are pros and cons. But at the same time, it seems quite realistic to choose a method that is most suitable for the specifics of a particular project.
When choosing a backup method, you must first pay attention to the following criteria:
- Speed (time) of backup in storage;
- Speed (time) of recovery from a backup;
- How many copies can be kept with a limited storage size (backup storage server);
- The volume of risks due to the inconsistency of backups, the unsustainable method of performing backups, the complete or partial loss of backups;
- Overhead: the level of load created on the server when copying, reducing the response speed of the service, etc.
- The cost of renting all services used.
In this article, we will talk about the basic methods of backing up servers running Linux systems and the most common problems that newcomers to this very important area of system administration may encounter.
The scheme of organization of storage and recovery from backups
When choosing a scheme for organizing a backup method, you should pay attention to the following basic points:
- Backups cannot be stored in the same place with the backup data. If you store the backup on the same disk array with your data, then you will lose it if the main disk array is damaged.
- Mirroring (RAID1) cannot be compared with backup. The raid protects you only from a hardware problem with one of the disks (and sooner or later such a problem will occur, since the disk subsystem is almost always a bottleneck on the server). In addition, when using hardware raids, there is a risk of controller failure, i.e. it is necessary to store his spare model.
- If you store backups within one rack in a DC or simply within a single DC, then there are certain risks in this situation (you can read about this, for example, here .
- If you store backups in different DCs, then the network costs and the recovery speed from the deleted copy sharply increase.
Often the cause of data recovery is damage to the file system or disks. Those. backups need to be stored somewhere on a separate storage server. In this case, the “width” of the data transmission channel may become a problem. If you have a dedicated server, then it is very desirable to back up using a separate network interface, and not on the same one that performs data exchange with clients. Otherwise, your client’s requests may not fit in a limited communication channel. Or because of customer traffic, backups will not be done on time.
Next, you need to think about the scheme and time of data recovery from the point of view of backup storage. You may be quite happy that backup is performed in 6 hours at night on a storage with limited access speed, but a 6-hour recovery is unlikely to suit you. This means that access to backups should be convenient and data should be backed up quickly enough. So, for example, recovering 1TB of data with a bandwidth of 1Gb / s will take almost 3 hours, and this is if you do not “run into” the performance of the disk subsystem in the storage and server. And do not forget to add to this the time of detecting the problem, the time to decide on a rollback, the time to check the integrity of the recovered data and the amount of subsequent dissatisfaction of clients / colleagues.
When incremental backups, only files that have been changed since the previous backup are backed up. Subsequent incremental backups only add files that have been modified since the previous one. On average, incremental backups take less time because fewer files are backed up. However, the data recovery process takes longer because the data of the last full backup must be restored, plus the data of all subsequent incremental backups. In this case, unlike differential copying, changed or new files do not replace the old ones, but are added to the media independently.
Incremental copying is most often done using the rsync utility. It can be used to save storage space if the number of changes per day is not very large. If the modified files are large, then they will be copied completely without replacing previous versions.
The backup process using rsync can be divided into the following steps:
- A list of files on the redundant server and in the storage is compiled, metadata (rights, modification time, etc.) or checksum (when using the –checksum key) is read for each file.
- If the metadata of the files is different, then the file is broken into blocks and a checksum is considered for each block. Different blocks are uploaded to storage.
- If a change was made to it during the calculation of checksums or file transfer, its backup is repeated from the beginning.
- By default, rsync transfers data through SSH, which means that each data block is additionally encrypted. Rsync can also be run as a daemon and transmit data without encryption over its protocol.
More information about rsync can be found on the official website .
For each file rsync performs a very large number of operations. If there are a lot of files on the server or if the processor is heavily loaded, then the backup speed will be significantly reduced.
From experience, we can say that problems on SATA disks (RAID1) begin after about 200G of data on the server. In fact, everything, of course, depends on the number of inodes. And in each case, this value can shift both in one direction and in the other direction.
After a certain trait, the backup time will be very long or simply will not work out in a day.
In order not to compare all files, there is lsyncd. This daemon collects information about changed files, i.e. we will already have their list ready for rsync. However, it should be borne in mind that it gives additional load on the disk subsystem.
In differential backups, every file that has been modified since the last full backup is backed up every time. Differential backup speeds up the recovery process. All you need is the last full and last differential backup. The popularity of differential backups is growing, as all copies of files are made at certain points in time, which, for example, is very important when infected with viruses.
Differential backups are performed, for example, using a utility such as rdiff-backup. When working with this utility, the same problems arise as with incremental backups.
In general, if a complete enumeration of files is performed when searching for a difference in the data, the problems of this kind of backup are similar to the problems with rsync.
We would like to separately note that if in your backup scheme each file is copied separately, then it is worth deleting / excluding files you do not need. For example, it could be CMS caches. In such caches there are usually a lot of small files, the loss of which will not affect the correct operation of the server.
Full backup usually affects your entire system and all files. Weekly, monthly, and quarterly backups involve creating a complete copy of all data. It is usually performed on Fridays or during the weekend, when copying a large amount of data does not affect the organization. Subsequent backups from Monday through Thursday until the next full backup can be differential or incremental, mainly to save time and space on the media. A full backup should be done at least weekly.
Most publications on relevant topics recommend a full backup once or twice a week, and the rest of the time, use incremental and differential. Such advice has its own reason. In most cases, a full backup once a week is enough. Performing it again makes sense if you do not have the opportunity on the storage side to update the full backup and to guarantee the correctness of the backup (this may be necessary, for example, if for some reason you do not trust your scripts or software for backup.In
fact, a full backup can be divided into 2 parts:
- Full backup at the file system level;
- Full device level backup.
Consider their characteristic features on an example:
root @ komarov: ~ # df -h Filesystem Size Used Avail Use% Mounted on / dev / mapper / komarov_system-root 3.4G 808M 2.4G 25% / / dev / mapper / komarov_system-home 931G 439G 493G 48% / home udev 383M 4.0K 383M 1% / dev tmpfs 107M 104K 107M 1% / run tmpfs 531M 0 531M 0% / tmp none 5.0M 0 5.0M 0% / run / lock none 531M 0 531M 0% / run / shm / dev / xvda1 138M 22M 109M 17% / boot
We will only reserve / home. Everything else can be quickly restored manually. You can also deploy the server with a configuration management system and connect our / home to it.
Full file system level backup
Typical Representative: dump.
The utility creates a “dump” of the file system. You can create not only a full, but also an incremental backup. dump works with the inode table and “understands” the file structure (for example, sparse files are compressed).
Dumping a working file system is “stupid and dangerous” because the file system can change during the dump process. It must be created from a snapshot (a little later we will discuss the features of working with snapshots in more detail), unmounted or frozen FS.
Such a scheme also depends on the number of files, and its execution time will increase with increasing amount of data on the disk. At the same time, dump has a higher speed than rsync.
If it is necessary to resume not the entire backup, but, for example, only a couple of accidentally corrupted files), it may take too long to extract such files with the restore utility
Full device level backup
- mdraid and DRBD
In fact, RAID1 is configured with a disk / raid on the server and a network drive, and from time to time (in terms of backup frequency), the additional disk is synchronized with the main disk / raid on the server.
The biggest plus is speed. The duration of the synchronization depends only on the number of changes made on the last day.
Such a backup system is used quite often, but few are aware that the backups obtained with its help may be incapacitated, and that's why. When disk synchronization is complete, the backup disk is disconnected. If, for example, we have a DBMS running that writes data to a local disk in batches, storing intermediate data in a cache, there is no guarantee that it will even get to the backup disk. In the best case, we will lose some of the mutable data. Therefore, such backups are hardly worth considering reliable.
- LVM + dd
Snapshots are a great tool for creating consistent backups. Before creating a snapshot, you need to flush the cache of the FS and your software to the disk subsystem.
For example, with one MySQL, it would look like this:
$ sudo mysql -e 'FLUSH TABLES WITH READ LOCK;' $ sudo mysql -e 'FLUSH LOGS;' $ sudo sync $ sudo lvcreate -s -pr -l100% free -n% s_backup / dev / vg /% s $ sudo mysql -e 'UNLOCK TABLES;'
* Colleagues tell stories of how someone read lock sometimes led to deadlocks, but in my memory this has never happened.
Next, you can copy the snapshot to the repository. The main thing is to ensure that during copying the snapshot does not self-destruct and do not forget that when creating a snapshot, the recording speed will drop at times.
DBMS backups can be created separately (for example, using binary logs), thereby eliminating a simple cache for the time of resetting. And you can create dumps in the repository by running the DBMS instance there. Backing up different DBMSs is a topic for individual publications.
You can copy a snapshot using resume (for example, rsync with a patch for copying block devices bugzilla.redhat.com/show_bug.cgi?id=494313), it is possible by blocks and without encryption (netcat, ftp). You can transfer blocks in a compressed form and mount them in storage using AVFS, and mount a partition with backups via SMB on the server.
Compression eliminates the problems of transmission speed, clogging of the channel and storage space. But, however, if you do not use AVFS in storage, then it will take a lot of time to restore only a part of the data. If you use AVFS, you will encounter its "dampness."
An alternative to block compression is squashfs: for example, you can mount a partition to the server via Samba and run mksquashfs, but this utility also works with files, i.e. depends on their quantity.
In addition, when creating squashfs, a lot of RAM is spent, which can easily lead to an oom-killer call.
You need to protect yourself from a situation when the storage or your server is hacked. If the server is hacked, it is better that there is no right to delete / modify files in the repository of the user who writes the data there.
If the storage is hacked, then it is also desirable to limit the maximum backup user rights on the server.
If the backup channel can be tapped, then encryption tools are needed.
Each backup system has its own disadvantages and advantages. In this article, we tried to highlight some of the nuances when choosing a backup system. We hope that they will help our readers.
As a result, when choosing a backup system for your project, you need to conduct tests of the selected type of backup and pay attention to:
- backup time at the current stage of the project;
- backup time in case there will be many times more data;
- channel load;
- load on the disk subsystem on the server and in storage;
- recovery time of all data;
- recovery time for a pair of files;
- the need for data consistency, especially database;
- memory consumption and the presence of oom-killer calls;
As backup solutions, you can use supload and our cloud storage .
Readers who cannot post comments here are welcome to join us on the blog .