Backing up virtual machines in a QEMU / KVM hypervisor environment

  • Tutorial
image

As you know, backups need to be done, moreover, you need to make them so that you can later turn around with them. This is especially true for virtual machines (VMs). Let's see how you can backup virtual disks of a machine in a QCOW / KVM environment. There are two main problems here: firstly, you need to get a consistent (integral) backup, i.e. if we have a DBMS or other software that actively uses its own write cache, then before backup it should be asked to flush the cache and freeze the disk, otherwise the data will get into the snapshot, but not the data, and when restoring the DBMS may not understand such a feint. The second question is the performance of VMs in snapshot mode, it would be nice if the VM did not slow down too much when we were making a copy, and would not freeze when we delete a snapshot.

I’ll give an answer to the first question right away - to get a consistent backup, you need to turn off the VM using the guest OS before creating a backup, then the backup will definitely turn out to be integral. If you are comfortable with this situation, you can no longer read the article. If not, I ask for cat.

So, to get a consistent backup without turning off the VM, you need to use the guest agent for QEMU (not to be confused with the guest agent for QEMU SPICE and paravirtual drivers for QEMU in general). In the Debian repository, Jessie is the qemu-guest-agent package, in Wheezy the package is only available through wheezy-backports. QEMU guest agent is a small utility that receives commands from the host through a virio channel named org.qemu.guest_agent.0 and executes them in the guest context. On the hypervisor side, the channel ends with a unix socket into which you can write text commands using the socat utility. True, Libvirt likes to take this channel itself, so if you use Libvirt to control the hypervisor, you will have to communicate with the guest through the “virsh qemu-agent-command” command. The QEMU guest agent commands are different, for example, here is my list:

  • guest-set-vcpus
  • guest-get-vcpus
  • guest-network-get-interfaces
  • guest-suspend-hybrid
  • guest-suspend-ram
  • guest-suspend-disk
  • guest-fstrim
  • guest-fsfreeze-thaw
  • guest-fsfreeze-freeze
  • guest-fsfreeze-status
  • guest-file-flush
  • guest-file-seek
  • guest-file-write
  • guest-file-read
  • guest-file-close
  • guest-file-open
  • guest-shutdown
  • guest-info
  • guest-set-time
  • guest-get-time
  • guest ping
  • guest-sync
  • guest-sync-delimited

A brief description of the commands is in the qga / qapi-schema.josn file in the QEMU sources, and a complete description can be obtained by analyzing the qga / commands-posix.c and qga / commands-win32.c files. From the analysis, you can, for example, find out that the commands guest-set-vcpus, guest-get-vcpus, guest-network-get-interfaces, guest-suspend-hybrid, guest-suspend-ram, guest-suspend-disk under Windows are not are supported, and guest-fsfreeze-freeze / guest-fsfreeze-thaw commands try to use volume shadow copying under Windows - VSS. However, since the article will focus on Linux as a guest, these subtleties will not concern us.

Among the entire list of teams we are interested in guest-fsfreeze-freeze and guest-fsfreeze-thaw. As the name implies, the first one “freezes” the guest's file system, and the second “freezes” it. The fsfreeze command (or rather IOCTL) is not a QEMU feature, but the ability of a guest virtual file system, which has appeared in the Linux kernel for quite some time. That is, you can freeze FSs not only in a virtual environment, but also on real hardware; just use the fsfreeze utility from the util-linux package. The fsfreeze man says that Ext3 / 4, ReiserFS, JFS, XFS are supported, but fsfreeze “frozen” Btrfs as well. Before actually “freezing”, but after all the write streams have completed, sync () is called with the kernel code (file fs / super.c, line 1329), so you can not worry about data integrity. At all,

So, we know that in order to get a complete snapshot, we need to call the guest-fsfreeze-freeze function from a guest using the QEMU guest agent. However, maybe we are worried in vain and this function is called when creating a snapshot? Alas, this is not the case for Libvirt (2.9), Proxmox (the pvetest branch), and Openstack , but to automate the call of the guest-fsfreeze-freeze function, you need to edit the source codes of the corresponding products, which is beyond the scope of this article.

Libvirt still knows how to freeze guest FS
Как подсказывает уважаемый Infod, оболочке virsh из состава Libvirt можно при создании снэпшота передать параметр --quiesce, который вызовет guest-fsfreeze-freeze при создании снэпшота:
virsh snapshot-create-as myvm snapshot1 "snapshot1 description" --disk-only --atomic --quiesce

Suppose we found a way (for example, with a self-written script) to “freeze” a guest’s file system before removing the snapshot. Now we are faced with the following task - to notify guest software immediately before freezing. QEMU guest agent supports the -F parameter, which says that before “freezing” and after “defrosting” you need to call the script / etc / qemu / fsfreeze-hook and the freeze and thaw parameters, respectively. Therefore, in Debian, the agent startup script (/etc/init.d/qemu-guest-agent) will have to be corrected: DAEMON_ARGS = ”- F”. Keep in mind that if the script fails, the FS will not “freeze”.

For a MySQL server, the first, but a broken script might look something like this:

#!/bin/bash

USER="<Пользователь>"
PASSWORD="<Пароль>"case"$1"in
  freeze  ) 
            /usr/bin/mysql -u $USER -p$PASSWORD -e "FLUSH TABLES WITH READ LOCK;"exit $?
            ;;
  thaw    ) 
            /usr/bin/mysql -u $USER -p$PASSWORD -e "UNLOCK TABLES;"exit $?
            ;;
  *       ) 
            logger Fsfreeze script has activated with unknown parameter: $1exit 1
            ;;
esacexit 1


In fact, the lock from the database will be released immediately upon completion of the command
mysql -u $USER -p$PASSWORD -e "FLUSH TABLES WITH READ LOCK"
due to the fact that all locks in MySQL work only while the user who set them is present in the system. For the correct backup, you will have to write an additional small service (for example, in Python) that will open the MySQL database and put a lock on the freeze command, and then not close the database and wait for the thaw command.

What about Windows as a guest?
Надо сказать, что для Windows и MS SQL эта же процедура не требует никаких теложвижений — QEMU guest agent автоматически вызывает соответствующую функцию службы теневого копирования тома VSS, VSS информирует всех подписчиков о том, что вот-вот начнется бэкап и неполохо было бы “сброситься” на диск и т.п.


So, we blocked the MySQL tables and “froze” the guest’s file system, it’s time to remove the backup. Suppose we store VM disk images in qcow2 format files, and not, for example, as LVM volumes. Even in this case, we are offered many options, it would be nice to understand them.

Internal QEMU snapshotExternal QEMU snapshotQEMU backupSnapshot of LVM volumes with qcow2 filesFS Brtfs snapshot with qcow2 files
MeansQEMUQEMUQEMUOSOS
QEMU Teamsavevm / snapshot_blkdev_internalsnapshot_blkdevdrive_backup
Libvirt / virsh commandsnapshot-create / snapshot-create-assnapshot-create / snapshot-create-as
OS teamlvcreatebtrfs subvolume snapshot
ViewEntries inside a disk imageSeparate file - disk imageSeparate file - disk imageBlock deviceFS (catalog) with disk images
ScopeSpecific VMSpecific VMSpecific VMAll storageAll storage
TechnologyRedirecting a recording to another area of ​​the same fileRedirecting a record to another fileFull copy of machine disks to another fileCopying original data to a snapshot device when changing themRedirecting recording to another area of ​​the file system
Copying a snapshot to the backup storageqemu-nbd / nbd-clientCopy fileCopy fileMounting a snapshot, copying a fileCopy file
VM disk write performance when snapshot is createdAverage (for each record, two sync () must be done, the qcow2: lazy refcounts option improves the situation)HighHighAbout 2 times lower than usualHigh
Storage load on commit snapshot deletionBelow average (need to overwrite metadata)High (you need to copy the data to the original image and overwrite the metadata)Low (need to delete the file)Low (you need to remove the block device)Below average (need to overwrite metadata)
Storage load on rollback to snapshotBelow average (need to overwrite metadata)Low (need to delete the file)Low for Libvirt (need to replace the file), high for Proxmox (need to unzip the file from the archive)High (you need to copy data to the original block device)Below average (need to overwrite metadata)

Each method has its pros and cons. So, the “Internal” method is, in fact, the standard in the Virt-Manager utility and the Proxmox environment, and receiving snapshots of this format is automated. However, in order to “pull out” the snapshot from the file, you need to raise the qemu-nbd-based NBD server and connect the image file to it. In the “External” method, a backup file ready for copying is obtained in the process of creating a snapshot, but the process of deleting a snapshot is not easy and involves “returning” (block-commit) the recorded data from the snapshot file to the base image, which is accompanied by a multiple increase in the write load in the process of removing a snapshot. For example, VMWare ESXi in the same situation "sags" in write performance by 5 times.. I must say that there is another way to remove a snapshot of the type “External” - copying all the blocks from the original image to the snapshot . This method is called block-stream, I can’t judge the appropriateness of using it in production, but obviously it will be a good benchmark for storage.

A snapshot of an LVM volume causes a drop in the performance of the main volume for writing, so it is better to use it when we are sure that during the existence of a snapshot, they will not write to the disk intensively.

Great prospects are opened up by using the BTRFS file system as a file system for storing disk images, since in this case snapshots, compression and deduplication are provided by the FS architecture itself. Cons - Btrfs cannot be used as a shared FS in a cluster environment, in addition, Btrfs is a relatively new FS and, possibly, it is less reliable than a combination of LVM and ext4.

The method of getting backups with the drive_backup command is good in that you can immediately create a backup on the mounted remote storage, but in this case it creates a large load on the network. For the remaining methods, you can provide for the transfer of only changed blocks using rsync. Unfortunately, QEMU backup does not support the transfer of only “dirty” (changed since the last backup) blocks, as is implemented, for example, in the VMWare CBT mechanism . Both attempts to implement such a mechanism - livebackup and in-memory dirty bitmapapparently, the first one was due to architecture (adding an extra daemon and a separate network protocol for this operation only), and the second because of obvious application restrictions: a dirty block card can only be stored in RAM.

In conclusion, consider a situation in which a VM has several mapped disk images. Obviously, for such a VM, you need to create snapshots of all disks at the same time. If you use Libvirt, then you have nothing to worry about - the library takes all the cares of syncing snapshots onto itself. But, if you want to perform this operation on a “clean” QEMU, then there are two ways to do this: stop the VM with the stop command, get snapshots, and then continue to execute the VM with the cont command or use the mechanismtransactional command execution available in QEMU. You cannot use only the QEMU guest agent and guest-fsfreeze-freeze / guest-fsfreeze-thaw commands for this purpose , because even though the agent "freezes" all mounted FSs in one command, it does this not simultaneously, but sequentially, so that synchronization is possible between volumes.

If you find a mistake in the article, or you have something to say, I ask in the comments.

Make backups, gentlemen!

Also popular now: