Backup and Restore in PostgreSQL

Suppose we have postgresql in streaming replication mode. master server and hot-standby ready to replace a dead comrade. With a bad development of events, we can only create a trigger-file and switch our applications to work with the new wizard. However, situations are possible when completely legitimate changes were made by crookedly written migration and ended up both on the master and on the slave server. For example, the data in part of the tables were deleted / changed, or the tables were completely deleted. From the point of view of the database, everything is fine, but from the point of view of business - a disaster. In this case, the proclamation of a hot hot-standby in the master, the procedure is clearly useless ...
To prevent this situation, there are at least two options ...

use periodic backup using pg_dump;
Use backups based on WAL backups and archives.

The first method is quite simple to implement and requires a minimum of installation and maintenance efforts. We put "pg_dump | lbzip2 "in crowns and forget. However, this option does not offer to restore the database directory at the time preceding the failure, but only at the time of the backup. The second option is a bit more complicated and more expensive in terms of storage, but this option is a more flexible solution in case of recovery. It’s about to be discussed.
From the pros:

the ability to restore the database cluster at any point in time relative to the time of creating the base copy and the time of failure;
As a condition for recovery, both a timestamp and a specific transaction can serve.

Minuses:

The base copy is the approximate size of the database cluster.
the need to store WAL archives during the storage period of the base copy.

As mentioned above, this backup method offers flexible recovery options (you can restore the state of the database at a clearly specified point in time or at the time before or after a certain transaction), but at the same time adds significant requirements for storing backups. The implementation is as follows:

setting archiving mode for WAL logs;
backup setup;
storing one or more backups;
deletion of the oldest backup in case of successful completion of claim 1;
removal of the corresponding WAL archives from the backup from clause 3;
optionally, it is possible to carry out a procedure for checking backups for their “professional suitability”.

The archiving mode of WAL logs is configured by including the archive_mode and archive_command parameters in postgresql.conf and creating a directory where the archives will be stored. For starters, you should enable archiving mode and evaluate the volume of archives created in one day of database operation. This will allow an assessment of the required storage space for archives and base copies. The following options are responsible for archiving:
archive_mode = on
archive_command = 'cp% p / opt / pgsql / pgbackup / archive /% f'

We configure direct backup using pg_basebackup. This is a program from a set of utilities that come with PostgreSQL that can be used both for setting up streaming replication and for taking backups. The principle of operation allows you to remove the backup without stopping the database cluster. Based on the task, we just need to run pg_basebackup on schedule in cron. Considering the requirements in place, you need to take care of sufficient disk space in order to avoid overflow.

Storage of backups is an optional task, since it is enough to have at least one backup. It is understood that at the time of starting the backup, we agree with the statement that the database is in the “correct” state (we won’t copy the broken database).

After the backup is completed, the old copy can be deleted along with the archives. Archives are deleted using the pg_archivecleanup utility. The utility allows you to accurately remove unnecessary archives based on special tag files that are created during backup.

It is also important to configure the procedure for checking the backup after it is created. The algorithm is quite simple: you need to copy the base copy to a certain sandbox directory (carefully, place!), Create in it the minimum necessary configuration files necessary to run in recovery mode and run postgres relative to this sandbox directory, after starting it is necessary to analyze the log and make a conclusion Whether the backup is recoverable.
Thus, the process fits into three steps: creating a base copy, checking it and deleting the old, previous base copy.

Now suppose the worst has happened and you need to perform a recovery. You need to stop the main postgres cluster and rename the database directory to an arbitrary name. The backup directory must be renamed to the database cluster directory. If necessary, copy the configuration files. After defining the configuration files, run postgres relative to our directory. Upon startup, Postgres will detect recovery.conf and will start in recovery mode. It remains to wait until postgres restores its state using archives, after which it will be possible to connect to the database and continue working. That's all, the recovery procedure is complete.

Here it is. Keep your data safe! Scripts for backup and validation of copies are here .

Tags:

Backup and Restore in PostgreSQL

Also popular now: