Backup system of virtual machines or how to save organization money

Virtualization is a very convenient and competent solution for the work of a modern enterprise with many tasks. It allows you to distribute tasks between different servers, and their administration is divided between several employees, each responsible for their own server. Thus, a distributed IT infrastructure of the enterprise is created, but concentrated in one or several physical servers - virtualization helps to more competently distribute the resources of modern multiprocessor systems - by placing several virtual machines on the same physical server.

A virtual machine is represented by several files that are located on the host hard drive, which allows you to perform operations such as copy, delete, restore. Everything is simple and clear: there are some files (configuration and hard drives), they run in a certain hypervisor, and everything works like a real server. It really saves money for the enterprise and time for the admin. A particularly good solution is a cluster, several hosts are combined here, and when one of them fails, virtual machines switch to another.

In our organization, we use a failover cluster based on the Windows Server 2012 R2 Data Center (Hyper-V cluster). From a little experience, I can say that a cluster is a rather convenient thing, since it is possible to free each host for any administrative tasks by simply transferring it - migrating virtual machines to another host - for example, to install updates or software, and the plus is that if If one of the hosts "burns", then the virtual machines will continue to work almost without a pause - migration will work ("the main thing is that it works") - everything will continue to work, but if it does not work and the virtual machine becomes damaged, then the data can be pulled out Go directly from the hard disk file.

Most of the problems that we had was the damage or disappearance of the xml files, which are configuration files or part of the snapshot information. This happened, for the most part, when the host rebooted, when the virtual machine did not migrate to another host in the cluster (while all the problematic machines were 2008 servers); running out of space on a virtual hard disk and for some reason the snapshot disappeared; after removing the snapshot, there was no merging, and the snapshot was deleted. What are the problems and with which tambourine to approach them - each time there was no desire to find out. Therefore, we took the following steps to optimize the system:

  • the transition of all virtual machines and hosts to 2012 R2 DATA CENTER, because it proved to be quite stable and faster in practice: it quickly shuts down, turns on and reboots, saves the budget through AVMA activation - unlimited activation of 2012 R2 DATA CENTER virtual machines inside the host;
  • removal of all snapshots from virtual machines (due to snapshots, a virtual machine can significantly increase in size);
  • reducing the size of static hard disks of virtual machines - allocating as much space as necessary (for example, for database servers), especially if you can always increase it if necessary;
  • the use of dynamic hard drives for those virtual machines where fast disk speed is not needed (for example, domain controllers, license servers, etc.).

Our fleet of virtual machines began to take up less disk space, and the machines were divided by tasks and in accordance with this they were assigned the required amount of resources.

All this was preparation for the creation of a backup system for virtual machines. We began to consider existing ready-made systems, but faced with the main problem - this is money. Yes, as always they are! We were told: "There is money, but there is not enough ..." Which means: "We won’t give anything!" But still, we did not lose hope for financing, so we began to consider paid options first - the first was DPM from Microsoft. Read about it - there are many good reviews, but there are many problems.

They tried to deliver, tried to install several times: the first time MS SQL died, it’s not clear why, the second time the most epic outcome that could have been expected - DPM killed itself! We created 2 virtual machines: 1 - sql-server, 2 - dpm with an attached physical disk (the machines were not located in the cluster, but on a separate Hyper-v Rezerv server), we installed agents in the cluster and started testing, as a result, everything works - everything backups . We began to test the recovery functions - they were restored to the Hyper-v Rezerv server, but they saw the inscription “Sql-server virtual machine will be deleted”. It turned out that the id of the machine that was being restored coincided with the id of the sql-server machine, because we tested recovery on the reference virtual machine, which we copy by export and so create new virtual machines, but this time the sql-server was created by registering in place ... Thus, we got two virtual machines with the same id - one reference in the cluster, the other on the backup Hyper-V Rezerv host. When restoring a virtual machine, DPM "looks" at id and, if it matches, it deletes the virtual machine, and restores the copy in its place. It turned out to be a very evil and funny mistake. I remembered forever that it is impossible to allow situations in the virtualization system, even on different hosts, clusters, and everywhere there were machines with the same id. It turned out to be a very evil and funny mistake. I remembered forever that it is impossible to allow situations in the virtualization system, even on different hosts, clusters, and everywhere there were machines with the same id. It turned out to be a very evil and funny mistake. I remembered forever that it is impossible to allow situations in the virtualization system, even on different hosts, clusters, and everywhere there were machines with the same id.

I thought: “Damn it, now I’ll create it all over again.” But it turned out that with all subsequent installations from different distributions, even on different hosts, the same unknown error popped up during the installation process. On the forums they wrote: broken distribution. So yesterday everything was fine! Yes, and downloaded several more times ... In general, magic, the evil eye, or simply melkosoft ... History is tragic. Therefore, they spat on the small ones and began to look the other way.

DPM is a pretty good thing and not expensive, but it uses MS SQL-Server Standart, stores backups on a dynamic disk - why do I focus on this because it is not convenient: SQL-Server costs money, is complicated, and without its bases working with dynamic the disk becomes useless, that is, the database is damaged and everything ... All backups are lost ... Or you need a very strong tambourine ... It is not clear why not using a static disk that could be connected anywhere with ease than a dynamic one. And with tape storage, DPM did not work: I constantly inventory tapes, but did not record anything.

The next product that I definitely liked is Veeam. It is very convenient and its green color certainly pleases the gray admin interface, but the price for the organization is staggering (we have a cluster with 6 dual-processor blades). We tested it and would be ready to use it if we had money. But alas.

In the process of reviewing backup products, we backed up manually, then using a PowerShell script. And later they decided to finish the script, which would fulfill the tasks.

Some will say that in vain ... Why reinvent the wheel ... That there are decent free programs.

I chose a script because I don’t need a database server. All that is needed is a “what when backup” plate and debug logs. And, in fact, it’s free and it’s self-development.

But there is one significant minus: with the help of paid and free programs, you can directly pull data from a virtual machine - files or databases, and the script allows you to make a copy of only the entire virtual machine. But we are working on it and maybe in the next articles I will demonstrate it. I'm still a newbie - 2 years in administration and 3 years of working with users, and you know, it is easier to communicate with servers.

So, we planned to backup fully virtual machines - why? Yes, because
  • it makes no sense to backup snapshots of virtual machines - which takes an hour or 2 to configure, for example, license servers, video conferencing, even terminals (if they are without data) - it is easier to save a fully configured machine and quickly configure it than to play a strategy with snapshots (from which it grows );
  • if you use dynamic disks, then many virtual machines without snapshots weigh up to 20 GB (for example, domain controllers) - this is an hour of export by force - that is, an hour and we have a full working copy of the virtual machine that we can deploy in place on the backup server in 2 minutes, provided that it was exported there.


What is critical for this method is the time: no matter how fast your network is, but building an exported copy over the network is a long process for Hyper-V - about 40 GB per hour on a gigabit network - at least this is the case with us faster, please explain in the comments why. From this calculation, for the night, from 00:00 to 7:00, we have the opportunity to save about 280 GB, but this is night and it’s small, and the 300 GB virtual machine is a rather large and clumsy animal - therefore, after analyzing our tasks, we limited ourselves 100-120 GB in size for virtual machines. Almost all of our servers, even database servers, entered this range. We set up backup depending on the variability and importance of the virtual machine - the more often it changes, the more often we copy, but this does not apply to the database server - we copy it once a week,

We did the following with file servers and such monsters as update servers (all machines are larger than 200 GB) - if physical disks are not connected to the virtual machine, then backup once every six months or once every three months, if they are connected, then backup once for virtual machine configuration, because it is impossible to export a physical disk. In such machines, you will have to set up copying files with a separate script or manually - we do this with the following batch file:

xcopy F:\DATA H:\BackUP /H /Y /C /R /E

Drive H is an iSCSI-connected drive.

Thus, we distinguished between file backups and backups of virtual machines. In addition, for greater reliability, we distinguished between networks: the Control network for hosts and the enterprise network for virtual machines. There is no access to hosts from the enterprise network, which provides protection against both unauthorized access and viruses that are successfully spread by users. The Control network is physically separated - these are separate switches connected to separate host ports, and when setting up virtual switches by hosts, only the control network is used. A separate virtual switch has been created for the enterprise network, which is not used by the hosts. Also, problems in the enterprise network, for example, rings or a “furious” network card, will not affect the operation of the Control network.

I will describe the configuration. I’ll clarify right away that the operating system on all servers is Windows Server 2012 R2 Data Center. The cluster consists of 6 blades Fujitsu BX924 S3 (3 pcs) and BX924 S4 (3 pcs), which are located in the chassis of the BX900 S2. Through the FC switch, the blades are connected to the Eternus DX80 disk shelf, on which LUNs are allocated - 250 GB for OSes, 20 GB quorum and 6 TB virtual machine storage. Cluster - Hyper-V cluster. There is also a single-server Fujitsu RX200 server with the Hyper-V role, which hosts virtual machines: the main enterprise domain controllers and the main cluster DC, secondary ones are located in the cluster. Even in our server park - a Supermicro backup server with two processors and 196 GB of RAM - this is the same Hyper-V Rezerv server.

The Control network is a one-gigabit network, the enterprise network is 10-gigabit within servers and 1-gigabit in other segments of the network.



Now let's move on to the script itself. I wrote it so that it could find a virtual machine in the cluster, connect to the cluster node, shut down or save the virtual machine and, in fact, export to the ball on Hyper-V REZERV, and then turn on the virtual machine.

Here is the script text:

$LOG="C:\ClusterStorage\Volume1\ps\log\log.txt" //расположение логов
$h="Gse264-bl1","Gse264-bl2","Gse264-bl3","Gse264-bl4","Gse264-bl5","Gse264-bl6" //массив хостов кластера
$VmName="Stream.local.ru (2012r2)" // имя виртуальной машины
$blade= hostname // имя хоста, на котором запускается скрипт
$password = ConvertTo-SecureString "Ofiget_Arbuz" -AsPlainText —Force // пароль для учетной записи доменного администратора для сети Control
$cred= New-Object System.Management.Automation.PSCredential ("control", $password ) // переменная содержащая данные аутентификации
foreach($i in $h){ //для каждого хоста в кластере
    $vms=Get-ClusterNode $i | Get-Clusterresource| ?{$_.ResourceType -eq 'Virtual Machine'}|Get-Vm //составляем список вм на хосте
    foreach ($cn in $vms){ //для каждой вм на хосте
               if ($cn.name -eq $VmName) осуществляем проверку: имя виртуальной машины совпадает с необходимым 
              {write "$(get-date -format "dd.MM.yy.HH.mm.ss") Обнаружена $VmName на $i" | out-file $LOG —append //пишим обнаружена вм 
              if ($i -eq $blade) //если хост является хостом, на котором запущен скрипт
               {write "$(get-date -format "dd.MM.yy.HH.mm.ss") $cn.name располагается локально" | out-file $LOG -append
               $path=get-date -format "dd.MM.yy.HH.mm" //имя папки 
               New-Item -Path \\Hyper-V REZERV\$blade\$path -ItemType "directory" 
               write "$(get-date -format "dd.MM.yy.HH.mm.ss") ______ЭКСПОРТ_____________" | out-file $LOG -append 
               Get-date | out-file $LOG -append
               try { 
                    $vmstate= get-vm $VmName 
                    if ($vmstate.state -eq "running"){
                      write "$(get-date -format "dd.MM.yy.HH.mm.ss") завершаем работу $VmName" | out-file $LOG —append 
                      Stop-VM $VmName -ErrorAction Stop} 
                   write "$(get-date -format "dd.MM.yy.HH.mm.ss") $VmName была выключена ранее" | out-file $LOG -append 
                   write "$(get-date -format "dd.MM.yy.HH.mm.ss") пытаемся экспортировать $VmName" | out-file $LOG -append 
                   Export-VM -Name $VmName -Path \\Hyper-V REZERV\$blade\$path -ErrorAction Stop 	
                   write "$(get-date -format "dd.MM.yy.HH.mm.ss") Возобновляем работу $VmName" | out-file $LOG -append 
                   Start-VM $VmName -ErrorAction Stop 
                   write "$(get-date -format "dd.MM.yy.HH.mm.ss") ВСЕ ПРОШЛО УСПЕШНО!!!" | out-file $LOG -append 
                   }
             catch {$path | Out-File $LOG —append
                        "$_" | Out-File $LOG -append
                        Start-VM $VmName
write "$(get-date -format "dd.MM.yy.HH.mm.ss") $VmName произошли ошибки!!!" | out-file $LOG -append
} } else { $s = New-PSSession -computerName $i -authentication CredSSP -credential $cred Invoke-Command -Session $s -Scriptblock { $blade= hostname $LOG="C:\ClusterStorage\Volume1\ps\log\log.txt" $path=get-date -format "dd.MM.yy.HH.mm" New-Item -Path \\Hyper-V REZERV\$blade\$path -ItemType "directory" $vm="Stream.local.ru (2012r2)" write "$(get-date -format "dd.MM.yy.HH.mm.ss") ______ЭКСПОРТ_____________" | out-file $LOG -append try { $vmstate= get-vm $vm if ($vmstate.state -eq "running"){ write "$(get-date -format "dd.MM.yy.HH.mm.ss") завершаем работу $vm" | out-file $LOG -append Stop-VM $vm -ErrorAction Stop} else {write "$(get-date -format "dd.MM.yy.HH.mm.ss") $vm была выключена ранее" | out-file $LOG -append} write "$(get-date -format "dd.MM.yy.HH.mm.ss") пытаемся экспортировать $vm" | out-file $LOG -append Export-VM -Name $vm -Path \\Hyper-V REZERV\$blade\$path -ErrorAction Stop write "$(get-date -format "dd.MM.yy.HH.mm.ss") Возобновляем работу $vm" | out-file $LOG -append Start-VM $vm -ErrorAction Stop write "$(get-date -format "dd.MM.yy.HH.mm.ss") $vm ВСЕ ПРОШЛО УСПЕШНО!!!" | out-file $LOG —append } catch {$path | Out-File $LOG —append "$_" | Out-File $LOG -append Start-VM $vm write "$(get-date -format "dd.MM.yy.HH.mm.ss") $vm произошли ошибки!!!" | out-file $LOG -append } } Remove-PSSession $s} }} }


The script itself is not complicated, it has functionality for tracking errors - try catch, there are a number of messages that are logged and make it clear how successful the export was. There is one catch - this script is run on behalf of the domain administrator, you need to check the boxes: “for users logged in” and “execute with the highest permissions”. Without these checkboxes, remote connection to another host will not be possible. It is also necessary to configure Powershell to export a virtual machine from one remote computer to a ball of another - multi-hop authentication. set up for this article. For hyper-v to be able to export to a network folder, you need to add the host name from which it is exported and give full access in its security settings. A script has been launched on the Hyper-V REZERV backup server to delete files and folders whose modification date is more than 63 days. It also has Veeam Backup Free edition installed, which once a month writes to magnetic tapes via fiber channel all folders with virtual machines that were exported for a month.

To summarize: a script has been created for the full export of a virtual machine - on the one hand it is not convenient, on the other hand it gives a chance to instantly restore the server to full functionality. A full copy is really long, but since the hosts are allocated in a separate network, which is small and physically separated from the enterprise network, this does not affect the productivity of the enterprise. Some machines need to be turned off before exporting, for example, such as database servers in order to get a working backup, so I advise you to make such servers small, but if this does not work, then backup only the databases, for example, according to the schedule using MS SQL. For smaller virtual machines, it is better to use dynamic disks - so exporting will take a little time.

For example, all our domain controllers use dynamic disks, and therefore such virtual machines weigh up to 20 GB - export takes 30-40 minutes. Also, some machines can be backed up "hot" without putting the machine in a saved state or without shutting down, but I did not find any detailed articles describing the tests for such an export, if you know more about this please write in the comments. Also, all I need to do is check the logs - the script worked or not, so I believe that such a system can really provide the enterprise with protection against data loss and save money.

PS: I express my deep gratitude to the users of the Technet forum, without their help, building my first cluster and setting up backups would be much more difficult. I hope that this article is still inexperienced admin will be useful and interesting!

Also popular now: