Backup ready: destroying holiday myths

Backup does not apply to the trendy technologies that are shouted from every iron. It just has to be in any serious company, that's all. Several thousand servers are backed up in our bank - this is a complicated, interesting work, about some of its subtleties, as well as about typical misconceptions regarding backups I just want to tell.
I have been dealing with this topic for almost 20 years, of which the last 2 years - in Promsvyazbank. At the very beginning of the practice I did backup almost manually, with scripts that simply copied the files. Then, convenient tools appeared in Windows: Robocopy utility for preparing files and NT Backup for copying. And then the time came for specialized software, primarily Veritas Backup Exec, which is now called Symantec Backup Exec. So I’ve been familiar with backups for a long time.
To put it simply, backing up is saving a copy of data (virtual machines, applications, databases and files) just in case with a certain regularity. Any case usually manifests itself in the form of a hardware or logical failure and leads to data loss. The task of the backup system is to reduce losses from information loss. A hardware failure is, for example, a server or storage failure where the database is located. Logical - this is the loss or change of a part of the data, including due to the human factor: inadvertently deleted a table, a file, launched a script to execute a curve. There are also requirements of the regulator for storing a certain type of information for a long period, for example, up to several years.

The most typical appeal to backups is the restoration of a saved copy of the databases for the deployment of various test systems, clones for developers.
There are several typical myths around backups that are high time to dispel. Here are the most famous of them.
Myth 1. Backup has long been just a small function inside security or storage systems.
Backup systems are still a separate class of solutions, and very independent. Too important business entrusted to them. In fact, they are the last line of defense when it comes to data security. So the backup works at its own pace, on its own schedule. A daily report is generated on the servers; there are events that act as triggers for the monitoring system.

Plus, the role model of access to the backup system allows delegating part of the authority to administrators of target systems for managing backups.
Myth 2. When there is RAID, backup is no longer needed.

Undoubtedly, RAID arrays and data replication are a good way to protect information systems from hardware failures, and if there is a standby server, you can quickly organize a switch to it in case of failure of the main machine.
From logical errors that were made by users of the system, redundancy and replication does not save. Here is a standby server with a delayed recording - yes, it can help out if an error is detected before it was synchronized. And if the moment is missed? Only backup made on time will help here. If you know that the data changed yesterday, you can restore the system as of the day before yesterday and extract the necessary data from it. Given that logical errors are the most common, the good old backup remains a proven and necessary tool.
Myth 3. Backup is what is done once a month.
The backup frequency is a configurable option, primarily depending on the requirements of the backup system. It is quite realistic to find data that almost never changes and is not particularly important, their loss will not be critical for the company.
Indeed, they can be backed up once a month or even less. But more critical data is stored more often, depending on the RPO (Recovery point objrective) indicator, which sets the acceptable data loss. It can be once a week, once a day, or even several times per hour. We have these transaction logs from the DBMS.

When introducing systems into commercial operation, backup documentation is necessarily approved, which reflects the main points, the update procedure, the procedure for restoring the system, the procedure for storing backups, and the like.
Myth 4. The volume of copies is constantly growing and occupies any allocated space completely
Backups have a limited shelf life. It makes no sense, for example, to store all 365 daily backups during the year. As a rule, it is permissible to keep daily copies for 2 weeks, after which they are replaced with fresh ones, and the version that was made first in the month remains for long-term storage. It, in turn, is also stored for a certain time - each copy has a lifetime.

There is protection against data loss. The rule applies: before the backup is deleted, the following must be formed. Therefore, the data will not be deleted if the backup fails, for example, due to the unavailability of the server. Not only the time frame is respected, but also the number of copies in the set is controlled. If the system says that there should be two full backups, there will always be two, and the old one will be deleted only when a new third one is successfully recorded. So the increase in the volume occupied by the backup archive is associated only with the increase in the number of protected data and does not depend on time.
Myth 5. Backup started - everything hung
It’s better to say this: if everything hangs, then the administrator’s hands do not grow from there. In general, backup performance depends on many factors. For example, from the speed of the backup system itself: how fast there are disk storages, tape libraries. From the speed of backup system servers: do they manage to process data, perform compression and deduplication. As well as the speed of communication lines between the client and server.
A backup can go into one or several threads, depending on whether the redundant system supports multithreading. For example, the Oracle DBMS allows you to give several threads, according to the number of available processors, until the transmission speed rests against the limitation of network bandwidth.
If you try to backup with a large number of threads, that is, a chance to overload a working system, it will really start to slow down. Therefore, the optimal number of threads is selected to provide sufficient performance. If even the slightest decrease in performance is critical, then there is a great option when the backup is carried out not from the battle server, but from its clone - standby in database terminology. This process does not load the main production system. Data can be taken through a larger number of threads, since the server is not used for maintenance.
In large organizations, a separate network is created for the backup system so that the backup does not affect the sales. In addition, traffic may not be transmitted through the network, but through the SAN.

We try to distribute the load also over time. Backups mainly go after hours: at night, on weekends. In addition, they do not start all at the same time. Virtual machine backups are a special case. The process has virtually no effect on the performance of the machine itself, so the backup can be smudged in the daytime, and not put everything off for the night. There are many subtleties, considering everything, backup will not affect system performance.
Myth 6. Launched a backup system - here's the fault tolerance.
Never forget that a backup system is the last line of defense, which means that there must be another five systems in front of it that ensure the continuity, high availability and disaster tolerance of the IT infrastructure and information systems of the enterprise.
It is not worth hoping that the backup will restore all the data and quickly raise the fallen service. Data loss from the moment of backup to the time of failure is guaranteed, and data on the new server can be uploaded for several hours (or days, as luck would have it). Therefore, it makes sense to create full-fledged fault-tolerant systems without shifting everything to backup.
Myth 7. I set up a backup once, checked that it works. It remains only to look at the logs
This is one of the most harmful myths, the fake of which you realize only during the incident. Logs about successful backups are not a guarantee that everything really went as it should. It is important to check the stored copy in advance for deployability. That is, start the recovery process in a test environment and look at the result.
And a little about the work of the system administrator
In manual mode, no one has been copying data for a long time. Modern IBS can backup almost everything, you just need to configure it properly. If a new server has been added, register policies: select the content that will be backed up, specify storage options, and apply the schedule.

At the same time, there is still a lot of work due to the extensive server fleet, including databases, mail systems, virtual machine clusters, and file resources on both Windows and Linux / Unix. Employees supporting the backup system are not sitting idle.
In honor of the holiday, I would like to wish all administrators strong nerves, clarity of movements and infinite space for storing backups!