We accelerate backup. Yandex lecture

    The next few lectures will be based on the first J. Subbotnik database, which was held in the spring . First, the developer Andrei Borodin spoke at Ya. Subbotnik. He talked about WAL-G, a simple and effective tool for backing up PostgreSQL to the cloud, as well as algorithms and technologies that allow WAL-G to create backups faster. The main feature of WAL-G is delta backups. From the lecture you will learn about their implementation and how support for this technology is developing in PostgreSQL.


    - Hello! I am a developer in Yandex from Yekaterinburg. To the technology of fast backup. We have been backing up for a long time, there have been reports by Vladimir Borodin and Evgeny Dyukov on how we research and what we develop in order to store data safely, securely, conveniently and efficiently. This series is devoted to the latest developments in this area.

    Let's talk about backups in PostgreSQL in principle. The standard data transfer utility, pg_dump, is defined as a console utility that creates a file with a logical representation of your data.

    The fact that this is a logical copy is quite convenient. You can always transfer data between different versions, you can slice your database into slices, and this is a standard tool that, for example, comes in the box with the PgAdmin administration utility.


    About pg_dump you first need to know that this is a developer tool.

    This is not a database maintenance tool. pg_dump is not designed to work with a high-load database.


    Suppose you are serious and you want to apply the “Restore to the point of time” technology, which uses the PostgreSQL API in working with online backups. You call the pg_start_backup function and make a file copy of the database. In fact, pg_start_backup causes the database to do CHECKPOINT; and include the full page writes entry in the forward writing log. The copy of the database you make when you call the API is not a consistent copy of the data. You also need a forwarding log to be able to restore your database at the time of the pg_stop_backup call, that is, at the end of the backup removal.


    Link from the slide

    After the end of the removal of the backup and in the presence of the log write ahead, you can recover to the desired life time point of your database.

    The box comes with the pg_basebackup utility, which implements all of this technology in a canonical way and allows you to make a backup with the minimum necessary functionality.

    If you are still more serious than before, then you use some kind of backup management software, and this is usually Barman.

    He has a few advantages. The main plus is a very common utility, it has a huge community, a huge number of questions on Stack Overflow.



    It is enough for you to raise one backup server and back up all your PostgreSQL. This is very convenient - as long as you have only one backup server.

    As soon as you have many backup servers, you need to keep track of whether one of them is full. In the event of a backup server failure, you need to understand which of your databases are in danger. You need to understand in principle where to copy the new database cluster when it is created.

    There is a much simpler backup utility called WAL-E.


    WAL-E executes four main commands. WAL-PUSH sends one WAL file to the cloud, and WAL-FETCH takes one WAL file if it is necessary to restore the restore_command.

    There are also BACKUP-PUSH (implements the backup API backup) and BACKUP-FETCH (takes all data from the cloud). Data is stored in the cloud, so you do not need to monitor anything, this is already a cloud service problem, how to ensure the availability of your data when you need it. This is probably the main advantage of WAL-E.


    There are quite a lot of functionality. There is a listing of backups, there is a retention policy, that is, we want to keep backups for the last seven days, for example, or the last five backups, something like that. And WAL-E can make backups to a huge variety of cloud services: S3, Azure, Google, can call a local file system a cloud.


    He has several properties. Firstly, it is written in Python and actively uses the Unix pipeline, partly because of this, it has dependencies and is not very productive. This is normal, because WAL-E focuses on ease of use, ease of setup, so that you cannot make a mistake when you make a backup plan. And this is a very good idea.

    There are a lot of features written in WAL-E, and it was not very clear to the authors where to develop it further. The idea came that we need a new tool.


    Link from the slide

    Its main feature is that it is rewritten to Go, it has almost no external dependencies if you do not use external encryption, and it is much more productive.


    Link from the slide

    WAL-G was originally created by two authors from Citus Data, and the main advantage is shown on this histogram - the speed of sending "shafts". We see that WAL-E is quick, it can be anything, there is a big bar around zero.


    Link from the slide

    WAL-G has a fairly stable bandwidth. On tests in Citus Data, he showed that he consistently sends about 800 MB / s of "shafts".

    In addition, in WAL-G, for example, I wrote a feature that implements a backup from a replica. You do not need to load your master database reading load, you can remove the backup from the replica.


    Link from the slide

    But there is one small problem. At the moment you start taking a backup, you must name a backup somehow. The name gets the timeline, which will be changed in case of need promotion of the replica. If the replica chain before the replica with which you make a backup failover, you zapromoutite any of the replicas, the timeline will be changed. WAL-G understands that this situation is inconsistent, because having a name on the old timeline, the name promises you that you can continue the development of the history of the database in any direction that existed. And it is not. In one of the directions you have already gone, you can not jump back to another timeline. Therefore, WAL-G understands this situation and does not upload a powerful JSON file to the cloud. You create a physical copy. But administrator intervention is required,



    We have implemented delta copies in WAL-G, I also did this development. This allows you to remove less data in the next base-backup, you do not make a copy of those data pages that have not changed from the previous backup.


    When configuring WAL-G, you specify the number of steps that are as far as possible from the base-backup of the delta-backup, and specify the policy of the delta copy. Either you make a copy of the last existing delta, or you make a delta from the original full backup. This is necessary in the case when the same database component is always changing in the database, the same data is constantly changing.

    Why, in principle, need a delta copy of the database? In theory, you have WAL, and so you can roll on any point.

    On a busy server, playing WAL five physical seconds past can take four physical seconds present. If you are asked to roll a WAL for four days, it means that it is possible that the person who is asking to do this should wait three more days. Not always an acceptable situation.

    You need base-backups for every day, but nevertheless, you cannot keep 7 or 14 full copies of your database there, even considering that WAL-G will archive them, it will still be quite large volumes. And in this case, help delta copies.


    When developing delta copies, several possible data file formats were discussed. First of all, a format was proposed, when we are simply changing the page that does not change. But we came to the conclusion that this is not a very effective way, then the zeros are then effectively compressed, but we later refused this storage because it is difficult to debug it in case of emergency situations.


    The next technology that was considered is storing the block number first and then the changed block. But here we are faced with the feature of storage in TAR-files, that we need to first specify the size of the TAR-files in which we store our delta copy, and then start recording it. We wanted to make the implementation of the technology with a minimum of RAM consumed, so we had to use the third format in which we first completely read each data file, look for changed data pages, first store the numbers of changed blocks into a TAR file, and only then the changed blocks themselves.




    This feature is not yet implemented. I'm looking at her or looking for a person who wants to make a pull request in WAL-G. When restoring from a delta copy, the database survives each of the reincarnations of the database for each step of the delta backup. Sometimes in the middle of life, some files are deleted. In this case, we could not survive their condition, if they are still removed, and then recreated from the delta copy. It seems that this is not a very complicated feature, so if you are interested in peeing something on Go, take a look at this feature.


    Graph about using the network, CPU and disk. At WAL-E, as we see, the backup is not over yet, started at one in the morning in Moscow, and did not end with the last report that we see. The WAL-G schedule is over, running faster and much more evenly in terms of resource consumption.

    The most interesting thing is the resource consumption schedule for delta copies. We see that all resources have become almost zero. The load on the CPU is practically the standard load on the database; at night, some queries are executed. We see a large prong reading. I also deal with them, I would also like to pull request or I will do it myself in the summer. The bottom line is that we still have to read our data to find what has changed in them. This reading could have been avoided.



    In WAL-G, there is deletion when we specify the number of backups or the date from which we need to store all WALs and all base backups. And WAL-G is already addressing the question of what WALs and base backups are needed. While we do not have features that would remove everything. In WAL-E, it is also a reason for pull request. The interesting DELETE EVERYTHING team is not yet implemented here.



    There is a listing of backups.


    We set the variable environment required in WAL-G to function, and call the WAL-G console utility. The backups that we need to view are displayed.


    WAL-G has implemented quite a few technologies for parallelizing backups and various operations in general. For example, this technology is used to send "shafts" to the archive. As soon as PostgreSQL calls archive_command to send a single file, WAL-G looks to see if there are any more files near ready.

    In general, this is not a very documented feature, it is very stable in recent versions of PostgreSQL, many technologies use it. We look to see if there are ready-made WAL files in archive status, and we also send them along with the one that I requested to send to the archive. And when PostgreSQL asks them to send, we have already sent them, everything is ready with us. This significantly speeds up the sending of WAL on loaded bases and allows you to make it not single-threaded. Normally, PostgreSQL prepares one file, then waits for it to go, prepares the next one. We manage to avoid this consistent work.


    With WAL-FETCH, when the cluster is restored, we are also trying to download the next N files that will be needed, and we are trying to balance the pauses between the start of the pre-fetch of WAL files in such a way as to maximize all the resources that we have: either push the network or rest against the disk in rare cases.


    This is all set up by environment variables.


    There is also a duplication copying parallelism. While this feature is not present in various releases - A. B. released in release 0.1.10 in June 2018, - since the parallelism of reading from the disk allows you to reliably lean against either the network or the disk. This is not very good with a loaded database. WAL-E had a feature that allows you to do throttling. So far, we do not have this. It is necessary to limit the speed of the backup removal so that the base can live its life normally and serve the load.

    Our main feature is not really about technology.

    Two years ago, Zhenya Dyukov implemented the delta back-up technology for Barman, it has not yet been killed, the community is still discussing it.

    Almost a year ago Zhenya repaired the WAL-E bug, and we sent it for half a year ( link to GitHub- approx. Ed.). Quite often in open source solutions there is a problem with the fact that they are not very well maintained.


    With WAL-G, everything is quite simple: we use it and I keep it to the mainteyne. If we or you need something, you simply report that you have a problem. We will try to solve it.

    The standard community request is “let's get the most”.

    More cryptography, more platforms. Maybe not only PostgreSQL, but still MySQL backed up or something else? I see a few other things.


    First of all, now, when sending the "shaft", we could understand which blocks of the database have changed, scan these WAL files and save information about what has changed.

    And when cron comes with another delta backup, we could not scan the entire database, cut the disk read disk and just know which pages we need to drag to the cloud.

    We tried to use page-track technology. At the kernel level, it implements the tracking of page changes. Backup is removed very quickly. The main problem with PTRACK is that it is very invasive. It contains a lot of changes in the PostgreSQL core in very sensitive places, so it is unlikely to be accepted soon.


    In addition, its deltas are slightly different from those that we now have. When removing the LSN-based-delta, we remove all changes in the delta file that occurred from the previous start to the current time.


    In the case of PTRACK, we get changes in the delta file, starting with the previous delta receipt. We do not have the exact time of the delta before the start of the backup, before the start of the removal of changes. This is not the main problem of PTRACK, in general it works well, but so far it is difficult to accept.


    PTRACK does not allow you to make a delta withdrawal in LATEST_FULL mode, because it stores the map of the changed blocks from the previous withdrawal of this map. Oracle has an interesting technology, there are 8 previous cards, which they save just in case. Maybe we could do something similar, but for now we are not working in this direction.


    Link from the slide

    Last September, I tried to offer the community a technology based on the fact that we add only the necessary hooks to the kernel, and implement the tracking of changed pages in an extension so that the patch on tracking pages is not too invasive. After discussing this technology, we came to the conclusion that we need several prototypes, and we will add hooks when there are prototypes. Maybe we will see how they will work. Now I am working on prototypes of these extensions that could use kernel hooks to track changes in the database.

    There is an expression in the community that every postgresist should have its own backup. This is bad. Everyone does their own thing, which does a critical task. There must be one thing that has everything in the box, everything will be perfect in a perfect world.

    What would you like to see in a box in basebackup? We would like to see, probably, archiving to the cloud. And delta copies.


    I would also like compression, concurrency, encryption, throttling, listing backups, verification, validation backups ... A lot of things. We understand that if all this is now offered to the community, there will be several dozen patches that are rather difficult to discuss and conduct through commitfest. So now we still use a separate tool, but there is a desire to devote time and technology to the community to make PostgreSQL better.

    Also popular now: