
Advantages of the new virtual machine backup method over classic schemes
- How to choose the optimal backup scheme for virtual servers?
- Is it always worth using the option installed in the programs by default?
- What are the differences in efficiency and reliability between the main backup algorithms of virtual machines?
- What backup method can get around the disadvantages of classic backup algorithms?
We understand under the cut.
The usual direct incremental method is usually set by default, and therefore is more often used. It is based on the fact that the first run creates a full backup and then saves the chain of subsequent increments. In order to increase the reliability of such a chain of backups and reduce the recovery time (it will grow linearly with the increasing number of increments created), it is periodically necessary to create either a new full copy or a synthetic one. The number of increments through which you need to re-create a full backup is indicated in the backup scheme settings. Schematically, the process looks like this:

The direct method provides high speed data processing (I / O), since it requires only one read / write operation for each stored data block. The increment creation time and the “life” time of the snapshot of the virtual machine are small, which minimizes the load on the production. However, the storage capacity consumption will be significant due to the storage of excess data. Why?
In practice, as a rule, companies establish a retention policy that governs the number of recovery points available (full copies and increments) or calendar storage time. In this case, the direct backup scheme must satisfy two conditions:
- The backup chain must be recoverable (that is, include a full backup and all subsequent increments. If you delete a part of the chain, it will be impossible to restore data from such a backup)
- The number of recovery points available must always be at least as specified by the user.
Suppose a specified shelf life of 7 days. Suppose a complete chain of 7 restore points has already been created, the next full backup, and, say, a couple more increments to it. Can I delete the first chain? No - if you delete it, only 3 recovery points will remain, and this contradicts paragraph 2 above. It turns out that you can get rid of outdated recovery points no earlier than after 14 days - hence the excess storage.
The reverse incremental method allows you to avoid overspending disk space . The mechanism for creating such backups is a little more complicated: the "fresh" increments are embedded in the originally created full backup, and the data blocks so replaced from the full copy are saved as the ones that preceded it.

The reversible incremental method, firstly, allows you to increase the efficiency of using the storage system due to the fact that there is always one full backup and a chain of previous increments (the "extra" increments are regularly deleted according to the set storage period). Secondly, the time to restore data from a backup created by the reverse method is minimal, since a full copy contains the most current version of the data and there is no need to spend time analyzing increments.
However, this algorithm also has its own “but”: data processing speed decreases and the snapshot's lifetime increases. For each saved data block, 3 read / write operations are required: read the data block being squeezed out of the full copy, write this block to the storage system in the form of a reverse increment, and then enter the new block of changed data in the full copy. As a result, if the storage system does not support this level of performance, the backup process will take a lot of time, and snapshots will increase the load on the production environment.
Avoiding Compromise
The Veeam Backup & Replication v8 implemented method of "direct-infinite incremental" backup, which combines the strengths of the algorithms discussed above, and allows you to immediately and speed up data, and rapid recovery, and economical use of storage.
With the direct incremental-infinite method, a complete copy and a chain of subsequent increments are created, which are stored until the specified storage period is reached (let it be N days). On day N, the last increment of the chain is recorded, and on the next run of the backup job, the following will happen:
- The next increment will be recorded as the latest recovery point in the chain (using one read / write cycle, as in the direct incremental version). At the same time, the backup program will automatically determine that the number of recovery points is 1 higher than the set limit;
- Next, the program will determine that the oldest file is a full backup saved at the beginning of the chain; and the oldest increment is the second in a row in the chain after the initial full backup (newer than a full backup for 1 round of a given backup cycle);
- The oldest increment in the chain will be embedded in the full backup file, replacing the corresponding obsolete data blocks. For this operation, 2 read / write operations are required: one to read data from the increment and the second to write this data to a full backup file;

- The increment file will be deleted from the chain and its place will be taken by the updated full backup. We can say that the oldest full copy will gradually “absorb” the increments whose storage period has expired.

Over time, such an operation will be repeated over and over again as new increments are added to the chain.
The total number of read / write cycles will remain the same as with a reverse incremental backup, however, it is important how the data will be processed. To create an increment, only one I / O operation is required, which means that the snapshot of the virtual machine will be open less time. The remaining 2 read / write operations are needed in order to update the full backup file, and snapshot is no longer involved. In addition, the process of creating a new full synthetic backup will be reduced to adding one increment, instead of combining a whole chain of increments, as would be the case when creating a “direct incremental” with full synthetic copies. The process of "collapse" of the oldest increment with a full copy will occur already outside the backup window without loading the production environment,
PS
More clearly, all the above algorithms are shown in Veeam KB-1799 :