Ten Million Backup Script

  • Tutorial
This is a backup script manual article written by me. The script is written in python for Linux. To whom it is interesting I ask under a habracat.


  • Create differential / full copies of folders
  • Making differential / full copies from the BTRFS file system
  • Create differential / full copies of LVM volumes
  • Creating BTRFS Snapshots
  • Backup / Snapshot Rotation
  • Logging backup progress
  • Email alerts
  • Scripting before / after backup


In /etc/apt/source.list add:
deb http://repo.nixdi.com/ubuntu/ precise soft

And execute in terminal:
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 74C7B31B5F4E1715 && apt-get update && apt-get install py4backup

The package is updated with the command:
apt-get update && apt-get upgrade py4backup

Manually download the package using the command:
wget http://repo.nixdi.com/ubuntu/py4backup_latest.deb

and install it:
dpkg -i ./py4backup_latest.deb

For distributions other than Ubuntu / Debian run:
git clone https://github.com/larrabee/py4backup

And copy the ddd and py4backup files to the directory with binary files (usually / usr / bin), the py4backup_lib.py file to the python libraries directory. You will also need to add dependencies manually. Requires python 3.x, btrfs-tools (btrfs-progs), lvm2, rsync. In the examples / folder, you will find examples of configuration files. You need to copy them to / etc / py4backup /


After installation, you need to copy the configuration files from the example. To do this, do:
mv /etc/py4backup/py4backup.conf.example /etc/py4backup/py4backup.conf
mv /etc/py4backup/jobs.conf.example /etc/py4backup/jobs.conf

And open the py4backup.conf file for editing with a text editor.
For boolean parameters, it is possible to use True / False, yes / no or 1/0.
You can separate a parameter from its value with the symbols '=' or ':'.
Each parameter should be in its own section. The section name is written before the parameter set in square brackets ('[]')
The order of the parameters in the section and sections is not important. If the parameter is not specified in the configuration file, then the standard value is used.
Example configuration file:

send_mail_reports = True
login = login@test.com
passwd = password
sendto = recipient@test.com
server = mail.test.com
port = 25
tls = True
bs = 4M
ddd_bs = 4096
ddd_hash = md5
logpath = /var/log/py4backup.log
enable_logging = True
log_with_time = True
traceback = False
command_output = True
temp_snap_name = py4backup_temp_snap
host_desc = My Description
pathenv = /sbin:/usr/sbin

Let's consider the parameters in more detail:
[ MAIL ]: here the parameters for sending notifications via email are defined.
send_mail_reports: enables / disables sending email reports after completing a task.
login: login to enter the smtp server.
passwd: password for entering the smtp server.
sendto: recipients of the notification. You can enter multiple addresses with a space.
server: domain name or IP address of smtp server.
port: port of smtp server.
tls: enables / disables the use of TLS encryption.

[ DD ]: The options for creating backups using the DD and DDD programs are indicated here.
bs: block size for the DD program (Used to create full copies of LVM volumes). You can specify the size in bytes, kilobytes (k) and megabytes (M). Affects the speed of creating a copy. The optimal value is 32M.
ddd_bs: block size for the DDD program (Used to create differential copies of LVM volumes). You can specify the size in bytes. The larger the size, the more space the differential copy takes, but the faster it is created. The optimal value is 4096.
ddd_hash: block hashing algorithm. You can choose between md5, crc32 and None. MD5 loads the processor more than crc32 and takes up more space, but in the case of using md5 there is much less chance of collisions.
None disables the creation of check amounts. The backup time, its size and the load on the processor are minimal, but if the backup is damaged, you will not know about it. Not recommended for use.

[ LOGGING ]: setting up job logging.
logpath: path to the log. If you use non-standard logging, be sure to change the logrotate settings.
enable_logging: enables / disables logging.
log_with_time: turns on / off the addition of a date and time to each log entry.
traceback: enables / disables the addition of tracebacks to the log when errors occur. Useful for debugging.
command_output: enables / disables the addition of console output to the log. Useful for debugging.

[OTHER ]: settings that are not included in other sections.
temp_snap_name: name of temporary snapshots. Used when creating a copy of LVM volumes or folders / files on the BTRFS file system. It is recommended not to change it unnecessarily.
host_desc: text description of the host. The value of this parameter will be added to the log file and email report.
pathenv: the value of this parameter will be added to the $ PATH variable (if validation passes). If you need to add several folders, you need to separate them with a colon (':') For example, in Ubuntu, to create copies of LVM volumes when running py4backup via cron, you need to add the / sbin folder to the $ PATH variable. In this case, the path is indicated without the last slash ('/')


General information

The list of tasks is in the file /etc/py4backup/jobs.conf
Example of the task:

type = file-diff
sopath = server:/opt/
snpath =
dpath = /mnt/backup_dest/
dayexp = 30
prescript = bash /root/script1.sh
postscript = bash /root/script2.sh
include = test test2
exclude = tests*

[xxx]: the unique name of the job.
type: type of job. See details below.
sopath: backup source. In types file-full, file-diff, you can specify remote hosts as the source.
snpath: where to create a snapshot. Used only by types btrfs-full, btrfs-diff and btrfs-snap
dpath: where to save the backup. In the types btrfs-full, btrfs-diff, file-full, file-diff, you can specify remote hosts as the destination.
dayexp: after how many days to delete old backups. If set to -1, backups will never be deleted.
prescript: a script that runs before the backup. Pipes, pipelines, and other bash operators do not work. If you need to execute complex commands, save them as a script and run it.
postscript: script that runs after the backup. The rest is similar to the prescript parameter.
include: what to include in the backup. See the description of backup types for details.
exclude: what to exclude from the backup. See the description of backup types for details.
Attention! All paths must end with '/'.

Types of backup

In each job, the type parameter specifies the type of backup. This parameter affects the cut pattern. copy and function some parameters.
There are 7 types of backups in py4backup:
  • file-full
  • file-diff
  • btrfs-full
  • btrfs-diff
  • btrfs-snap
  • lvm-full
  • lvm-diff

Let's take a closer look.


Creates a backup using rsync. A backup is created of the folder specified in sopath including all folders mounted deeper.
In the variable sopath and dpath, you can specify not only local folders, but also remote hosts. For example:
sopath = root@ / home / admin / or dpath = server: / home / admin. In the second case, the ~ / .ssh / config file must have a valid entry. Key authorization is used (additional info. See in the wiki of your distribution kit).
You cannot specify sopath and dpath by remote hosts at the same time.
The values ​​specified in include and exclude are passed to rsync as options --include = and --exclude =. You can specify multiple values ​​separated by spaces.


Creates a differential backup from the source (sopath) and the last full copy found in the destination folder (dpath). If a full copy is not found, the job will fail.
The list of parameters is similar to the type 'file-full'.


This type is similar to the type 'file-full', but before creating a backup, a snapshot of the reserved directory is made and the copy is already taken from the snapshot.
For this type of backup, you must specify the snpath parameter. A temporary snapshot of the source folder (sopath) will be created in the folder specified in snpath. Moreover, the path specified there must be located on the same file system with the folder specified in sopath. Please note that only the contents of this subvolume file system are copied. All mounted folders and subvolume will be ignored. The list of other parameters is similar to the type 'file-full'.


With this type of cut. first, a snapshot is taken from the source folder (sopath), and then a differential copy is created from the snapshot and the last full copy found in the destination folder (dpath). If a full copy is not found, the job will fail.
Just as for the type 'btrfs-full', it is necessary that the snapshot folder (snpath) be on the same file system with the source folder (sopath).
Please note that only the contents of this subvolume file system are copied. All mounted folders and subvolume will be ignored. The list of other parameters is similar to type 'file-full'


This type creates snapshots from the source folder specified in sopath to the snapshot folder specified in snpath.
For this type, exclude, include, dpath parameters do not work. Just like for the type 'btrfs-full' it is necessary that the folder for snapshots (snpath) be on the same file system with the source folder (sopath).
Please note that only the contents of this subvolume file system are copied. All mounted folders and subvolume will be ignored.


This type is designed to create full copies of LVM volumes. Consider some features of this type. The sopath parameter specifies the path to the Logical Volume Group (VG). For example:
sopath = / dev / main_vg /
By default, the script will make a copy of all volumes in this VG.
The dpath parameter specifies where to back up. You cannot specify remote hosts as the backup destination. In order to make copies of only the necessary volumes, you can use the include and exclude parameters.
The exclude parameter specifies which volumes to exclude from the backup. In addition, it accepts the codeword all, meaning that all volumes should be excluded.
The include parameter specifies which volumes to include in the backup. It takes precedence over exclude. For instance:
exclude = all
include = mail root

will backup only mail and root volumes. And the following example will make a copy of all volumes except the mail volume:
exclude = mail


And the last (for version 1.5) backup type is designed to create differential copies of LVM volumes.
The script looks for the last full backup in the destination folder (dpath) and, if it finds it, creates a differential copy between it and the current state snapshot. In this case, 2 * -diff.dd and * -diff.ddm files will appear in the destination folder. They are BOTH necessary for recovery.
All parameters are similar to the lvm-full type.


Running the required tasks for execution is very simple.
You must specify the –jobs (or -j) switch and specify the names of the required jobs after it. For example:
py4backup --jobs backup_data backup_home backup_media
All the specified tasks will be executed sequentially in the order they are followed in the --jobs parameter. You can also run the script via cron, but remember that the variable surrounded by cron may differ from the user one and you may need to specify the paths to the utilities rm, dd, rsync, btrfs, lvcreate, lvremove in the pathenv variable in the configuration file.


So we come to the most interesting. A backup in itself costs nothing, without the ability to quickly restore a backup. In this section, I will describe typical recovery cases from backups created by a script.

File backups

The following applies to both full and differential backups made by jobs like btrfs-full, btrfs-diff, file-full, file-diff. Restoring a backup requires rsync with the -aAX switches. For instance:
rsync -aAX /mnt/backup/home/2014-06-21-full/ /home/

rsync -aAX /mnt/backup/home/2014-06-22-diff/ /home/

In both cases, in the destination folder, you will receive a complete copy of the data, ready for use.

Snapshot Recovery

Snapshots created by the btrfs-snap type can be restored in several ways.
  • Like a file backup by copying rsync data (too long and not interesting)
  • Having mounted a snapshot, instead of the folder that needs to be restored. We will consider this method below.

By default, snapshots are created in read-only mode. Accordingly, you cannot write directly to this snapshot. Consider an example.
BTRFS is used as the root file system. Using the script, snapshots of the / home folder are created and added to / snapshots_home. And now the day has come when we need to restore the / home folder from the snapshot.
The first step is to free the / home folder (rename or delete it).
Next, we select the snapshot we need (let it be a snapshot, for 2014-06-19) and create a snapshot from it (yes, yes, a snapshot of a snapshot):
btrfs subvolume snapshot /snapshots_home/2014-06-19 /home

Thus, we first made our data writable and secured. Even when the script according to rotation removes the snapshot from 2014-06-19 our freshly created snapshot will be intact.

Recovery of full LVM backups

It's all very simple.
You must create a new LVM volume equal to or larger than the backup and copy the backup to it using dd.
dd if=/backups/2014-06-19-old_volume-full of=/dev/main_vg/new_volume bs=32M

Recovery of differential LVM backups

For this recovery, you need to use the ddd utility that comes bundled with py4backup.
To restore it, it must specify the –restore option, the –s switch with the path to the file WITHOUT EXTENSION, the –r switch with the location of the recovery (block device or file). ddd remembers the path to the full backup, but if it was moved, you must manually specify a new path for it. This can be done using the -f switch.
There are backups in the / backup folder:

root@virtserver / # ls /backup/

And we want to restore the backup for 2014-06-19 to the device / dev / main_vg / volume To do this, run the command:

ddd --restore -s /backup/2014-06-19-volume-diff -r /dev/main_vg/volume

Suppose a full copy has been moved to the / backup_old / folder:

ddd --restore -s /backup/2014-06-19-volume-diff -r /dev/main_vg/volume -f /backup_old/2014-06-18-volume-full

After recovery, ddd will display a list of damaged blocks indicating the file where the damaged block is located. The full23 record indicates damage to block number 23 in the full copy file, and the diff24 record indicates damage to block 24 in the differential copy.

Tips & Tricks

Here I will talk about some not obvious moments and options for using the script.
  • If you run the script without parameters, a list of available tasks is displayed.
  • ddd can be used separately from py4backup. The path to the full copy is specified with the -f switch, the path to the modified file is specified with the -s switch. The -r switch indicates where to save the differential copy. Example:
    ddd -f /backup/2014-06-18-volume-full  -s /dev/main_vg/volume_snapshot -r /backup/diff_backup_name
  • If you want to check the differential copy created by ddd, but not restore it, you can specify / dev / null as the destination


Disclaimer: the author of the script is not responsible for the action or inaction of the program, resulting in the loss or damage of data.
There are errors in the script (mostly minor, it works stably for me on 4 test machines) and I will be grateful for bug reports (especially with tracebacks and console output of commands).
This manual is relevant for version 1.5.3.
You can contact me by email, the address larrabee@nixdi.com or through Habr.
Source code on github .
Packages in the repository .
Thanks for reading and I will be grateful for the comments.

Also popular now: