10 ways to back up PostgreSQL

    Many conversations about backups begin with the saying that people are divided into two categories ... and so I belong to those people who make backups. Properly configured backups and backup validation enhance sleep. And the presence of pre-written and lost restoration instructions generally strengthens digestion and immunity. So, while working with PostgreSQL, I often had to configure backups, while the conditions and requirements were very different. However, with a few exceptions, the toolbox remained unchanged. In this article I will share my experience in how to take PostgreSQL backups.
    image


    If we consider the backup as a very specific process, then there are two simple questions:
    1. where to start the backup?
    2. What tools should be used for backup?

    There are two possible answers to the first question: you can run the backup task from a dedicated backup server, in my opinion this is the most suitable option. Or run the task directly from the database server, this is if there is no dedicated backup server.

    With tools, everything is much more interesting. Here I distinguish two groups, the main tools and auxiliary. The main ones are those that actually perform backups. Auxiliary ones are those that add something special to the backup process, such as archiving, encryption, load management, etc.

    There are 2 utilities in the PostgreSQL suite that allow you to backup, these are pg_dump / pg_dumpall and pg_basebackup . In addition, it is possible to use file copy utilities such as rsync, tar, cp, etc.
    So, what tool to run backup?
    pg_dump - suitable for cases when you need to backup a table, database, schema or data.
    pg_basebackup - suitable for cases when you need to backup the entire database cluster or configure a hot standby replica .
    rsync / tar / cp - also used for cases when copying the entire cluster.

    When PostgreSQL 9.0 was released, backups were performed using rsync, but pg_basebackup appeared in 9.1, which has some advantages over rsync:
    • pg_basebackup does not require ssh access, but requires access to the database specified in pg_hba.conf ;
    • pg_basebackup is richer in functionality (copying WAL, creating recovery.conf, built-in gzip compression, etc.);
    • pg_basebackup does not require a separate call to pg_start_backup / pg_stop_backup functions as required when using rsync / tar / cp;
    • pg_basebackup performs copying faster than rsync by using the streaming replication protocol.

    but there are some disadvantages:
    • pg_basebackup goes out-of-the-box, and accordingly requires installed postgres;
    • pg_basebackup does not have built-in functions to limit copy speed (promise only in 9.4);
    • pg_basebackup requires the included options wal_level = hot_standby, max_wal_senders in postgresql.conf.


    Here I will consider pg_basebackup, although pg_dump can also be used in the methods listed below.

    1. Simple and backup-free backup from the backup server to the / backup directory (the directory must be previously created):
    backup@backup ~ $ pg_basebackup -x -h db01.example.com -U backup -D /backup
    

    2. Copying with a lower priority of IO operations using ionice, for cases when you need to reduce the load on disk I / O from backup:
    postgres@db01 ~ $ ionice -c 3 pg_basebackup -x -h db01.example.com -U backup -D /backup
    

    3. Copy with compression to bzip2, for cases when you need to use the compression algorithm (gzip), non-standard for pg_basebackup. Here we pass the data through standard output (stdout) to standard input (stdin) to the bzip2 program.
    backup@backup ~ $ pg_basebackup -x --format=tar -h db01.example.com -U backup -D - |bzip2 -9 > /backup/db01/backup-$(date +%Y-%m-%d).tar.bz2
    

    4. Copying with compression into several streams (we use lbzip2 and use 6 cores). In this situation, you can use idle cores and speed up the compression process.
    backup@backup ~ $ pg_basebackup -x --format=tar -h db01.example.com -U backup -D - |lbzip2 -n 6 -9 > /backup/db01/backup-$(date +%Y-%m-%d).tar.bz2
    

    5. Here, the copy is started on the database server. The generated backup is sent to the remote server via ssh.
    postgres@db01 ~ $ pg_basebackup -x --format=tar -h 127.0.0.1 -U backup -D - |ssh backup@backup.example.com "tar xf - -C /backup/"
    

    6. Here, the copy is also launched on the database server and sent to the remote server, but with archiving into 6 streams using lbzip2.
    backup@backup ~ $ pg_basebackup -x --format=tar -h 127.0.0.1 -U backup -D - |ssh backup@backup.example.com "lbzip2 -n 6 -9 > /backup/db01/backup-$(date +%Y-%m-%d).tar.bz2"
    

    7. Copying to a remote server with a bandwidth limit of up to 10 MB using pv and subsequent archiving on the remote side. This option is for cases when you need to transmit without loading the network.
    backup@backup ~ $ pg_basebackup -x --format=tar -h 127.0.0.1 -U backup -D - |pv -r -b -L 10M |ssh backup@backup.example.com "bzip2 -9 > /backup/db01/backup-$(date +%Y-%m-%d).tar.bz2"
    

    It is worth noting that since 9.4 pg_basebackup already has the ability to limit the transmission speed (-r, --max-rate).
    8. Copying starts on the backup server, and then the stream splits into two parts. One stream is compressed with bzip2 (backup itself) and the second stream through tar is copied to a temporary directory for subsequent validation. The method is rarely used, but the implementation itself is interesting here.
    backup@backup ~ $ pg_basebackup -x --format=tar -h db01.example.com -U backup -D - |tee >(bzip2 -9 -c > /backup/db01/backup-$(date +%d-%b-%Y).tar.bz2) |tar xf - -C /backup/validation/
    

    9. Copying using lbzip2 on both nodes, for cases when the network has a small bandwidth, the stream is first compressed, then transmitted over the network and then decompressed on the remote side. It uses tar and requires pg_start_backup ('label_name') to be executed on the postgres side.
    postgres@master # cd /var/lib/pgsql/9.3/data
    postgres@master # tar cfO - ./ |lbzip2 -n 2 -5 |ssh postgres@standby "lbunzip2 -c -n 2 |tar xf - -C /var/lib/pgsql/9.3/data"
    

    10. backup with encryption via GPG, for cases when you need to encrypt a backup. First, create the keys via gpg --gen-key (in my case, the keys are created with the name backup)
    backup@backup ~ $ pg_basebackup -x --format=tar -h db01.example.com -U backup -D - |gpg -r backup -e |bzip2 -9 > /backup/db01/backup-$(date +%d-%b-%Y).tar.bz2
    

    To decrypt the backup, run the following command
    backup@backup ~ $ bzcat /backup/backup-09-May-2014.tar.bz2 |gpg -r backup -d |tar xf - -C /example/dir/
    

    That's all, let's summarize the tools:
    • pg_basebackup - utility for creating postgres backups;
    • lbzip2 - bzip2 compression using several kernels - if you need to pack faster (analogues: pbzip2 , pigz );
    • ionice - class and priority adjustment for the I / O scheduler (you can also use nice to adjust the priority of processes for the scheduler CPU);
    • pv - control the amount of data transmitted through pipe, etc. use to limit the amount of data transmitted per unit of time (analog - throttle );
    • tar - archiving utility, needed for auxiliary purposes when bzip2 / gzip compression is not used;
    • tee - reading from stdin with writing to stdout and other files (part of coreutils);
    • gpg - solves encryption tasks.

    Thank you all for your attention!

    Also popular now: