Backing up user data

Not so long ago, as part of testing EMC product lines, I had the opportunity to work with their EMC Avamar Virtual Edition solution. I want to share this experience with you in this publication.

Problem history

After talking with colleagues from various companies, I concluded that many neglect to back up workstations of company employees, limiting themselves only to copying production applications.

On the one hand, this is understandable - even for “combat” services there are still not many established practices that “went to the masses”. Therefore, ensuring good RTO and RPO performance for critical services is one of the main headaches of the IT departments of both large and medium-sized companies. Spending time on such trifles as backing up custom machines is an inadmissible luxury.

However, in practice, this means that a workstation failure leads either to the transfer of hard drives from the old device to the new one (with subsequent difficulties in inventory and accounting), or to the complete loss of user data.

Of course, it has long been considered a good form to store important data on NAS servers and / or corporate file hosting services. In this case, all important information is saved, but temporary / non-critical data is lost. Plus, after replacing the iron, the user will have to re-personalize his workplace, which will not add to his good mood.

Perhaps this is not important on a business scale, but in practice it leads to operational delays, increases downtime in the event of a breakdown, and trivially increases the tension between the IT department and internal customers. And the latter is an important factor that many IT managers undeservedly ignore.


This problem can be solved in several ways:
1. Leave everything as it is;
2. Virtualize the workplaces of employees;
3. Implement backup services for jobs.

Of course, the 2nd scenario provides maximum benefits that have already been repeatedly painted and proven in various sources. However, this is a rather expensive solution to implement. And the transition from the "classical" scheme of organizing a workplace to a virtualized one is associated with a large number of difficulties, both technical and operational.

On the other hand, centralized backup tools are easier to implement, do not require a major restructuring of the infrastructure, and are often more budgetary.

What can EMC offer us?

Recently, EMC has undergone significant structural changes that have resulted in, among other things, the creation (read the “merger”) of the new DPAD — Data Protection and Availability Division. As the name implies, this department includes solutions for organizing data protection and accessibility. So far, we are interested in just protection solutions.
So, at the moment, EMC DPAD can offer us 3 data backup solutions: Data Domain, NetWorker, Avamar. In this article, I will focus on describing the EMC Avamar solution. However, a few words for each product should still be said.


Software for centralized backup and data recovery. It has wide functionality for automation, centralized management and monitoring. And in general, it is optimized for working with large and complex infrastructures.

Data domain

The hardware solution for storing backups, works on CIFS, NFS, Boost, VTL (Virtual Tape Library). Without its own backup tools, Data Domain requires separate backup software, such as NetWorker. Boost technology is now gaining popularity - an advanced interface for transporting deduplicated data and managing Data Domain systems.


A hardware and software solution designed to create and store backup copies of workstations and medium / low loaded services and applications.

How it works?

Avamar works according to the classical server-client scheme. Client applications are installed on the protected machines, which collect, deduplicate (exclude duplicate data) and transmit data.
Therefore, copying is always incremental - i.e. At first, the system makes a full copy of the data, and then only changing blocks. This can significantly reduce the load on the channel and the cost of storage capacity. In order not to get confused with incremental backups, Avamar keeps synthetic full backups.

And the key principle of Avamar’s work is precisely its deduplication algorithm.
In a nutshell, this technology eliminates unnecessary data transfer not only from a specific computer, but also from all sources that have already been backed up.

Below is a little more detailed description of the deduplication algorithm used by Avamar.
At Avamar, deduplication occurs on the client, so to speak, at the "source" of the data.
As an example, let’s analyze the backup task for a completely new file that has never been backed up by anyone:
1) A backup task has arrived ;
2) The Avamar client launches the avtar.exe process, which is engaged in the further deduplication process, and first all processing will occur at the file level;
3) Using the SHA-1 algorithm, a 160-bit hash value of the file metadata is calculated;
4) Avtar looks for the received hash in its database of “file hashes” - F_cache.dat. It is located on the client, for example, in C: \ Program Files \ avs \ var, and during backup F_cache.dat is completely unloaded into RAM;
5) Since our file is new, its hash will not be found in F_cache.dat on the client, and then the search will be performed on the side of Avamar’s server in the same F_cache.dat, but there has already been collected the hash database from all previous backups, all workstations;
6) But there is no hash value. Then the avtar process moves to the block level and we are already talking about the backup of data blocks;
7) Data is divided into segments of various lengths from 1 B to 64 KB, an average of 25 KB;
8) After that, the segments are compressed by 30 - 50%, while if the segment is not compressed by more than 25% - its further processing occurs without compression;
9) 160-bit hashes are calculated for each compressed / incompressible segment;
10) hashes are combined into composites of 8,000 to 30,000 hashes;
11) then the hashes of the hash composites are calculated;
12) hashes of composites hashes are again combined into composites;
13) and a large root hash is calculated - this is a hash of the composite hashes of the hashes composites
14) Avtar looks for the received root hash value in its database of already “block hashes” - P_cache.dat.
15) If the root hash is not found on the client, the search will occur on the Avamar server side;
16) If the root hash is not found on the Avamar server, then the search is already performed by the hash of the hash composites, again first on the client, then on the server;
17) And so on until at least some hash is found;
18) Only after all unique data blocks are found, the hashes of which are not recorded anywhere, data will be transferred from the workstation to the avamar server;
19) On the avamar’s server, the just-saved bunch of “data block - hash” is recorded under its unique index;
20) A backup map is created - these are links to files and blocks;
21) And all this is made out in the PostgreSQL database;
22) The number of the backup task is recorded, according to which it is subsequently determined from which client the backup was made, the time to store the backup, the index tree, the backup card, etc. etc.

Such an algorithm has two sides of the coin:

1. This allows you to significantly save on the width of the data transfer channel and allows you to back up even from sources at a remote site (example: workstations of a branch network).
2. But deduplication on the source takes about 5-10% of the processor performance and the amount of RAM. For the server, this can be quite a critical figure, but for workstations it will be inconspicuous.

When integrating with EMC Data Domain, the server stores only metadata on its local storage, and the data itself is already compressed after deduplication and placed on the Data Domain. Such a scheme works on the boost technology used in DD.

The recovery procedure occurs as follows:
In general, we need a new / fixed device with the installed OS and Avamar client. Windows devices are an exception: its file system with a bootable partition can be included in the backup, which will allow you to deploy the backup to bare metal using the baremetal utility.


Since Avamar is a combination of Software and Hardware products, each solution is formed from a set of different nodes - nodes. A node is a host with Avamar deployed on it (control node) or disk storage (hundred-plus nodes).
There is an Avamar Grid, it happens both in the form of a single-node configuration (in this case it is one Storage Node), and in a configuration like RAIN (redundant array of independent nodes)

More details about types of nodes

  • Utility Node - Manages all the necessary services
  • Storage Node - Stores data, there are three types - M600, M1200, and M2400, differing in performance and capacity
  • Spare Node - A backup node in case of failure of the stack node, is activated and manually entered into operation
  • NDMP Node - Node for Avamar using NDMP
  • Media Access node - A node for connecting tape devices

In addition, EMC offers its customers a virtual execution model - AVE (Avamar Virtual Edition). To do this, you must have a hypervisor on which the Virtual Machine with the Avamar server will be deployed. The backup cluster datastore or external storage systems will be used as a backup storage location.

Virtual Avamar is available in several versions - at 0.5TB; 1TB; 2TB; 4TB usable volume. This is governed by a license acquired by the company.

In addition, EMC guarantees the convenient integration of AVE with VMware as the recommended hypervisor and EMC Data Domain as the recommended storage.

I happened to work with both Avamar Grid and Avamar Virtual Edition.

And since EMC has a trial version of Avamar Virtual Edition, I’ll show you how to install it.

Avamar Virtual Appliance Installation Process

Avamar Virtual Appliance Installation Process
If you have a raised VMware cluster, the installation is quite simple:
1. We deploy the image and the .vmdk disk of the virtual machine downloaded from the EMC partner resource with installation packages inside.
2. Depending on the version of AVE, create the required number of virtual disks and connect them to a VM with AVE. In our case, these are 3 disks of 250 GB each.
3. After deploying and starting the VM, we get the bare SUSE Linux Enterprise, but with all the necessary repositories inside. Connection to it is carried out through the virtual console vSphere, and management through the console. Using the script /usr/local/avamar/bin/ we prepare the OS for installing Avamar on it.
4. Configure network interfaces. The easiest way to do this is with the utility.dpnnetutil , it is already part of the virtual machine , but it is also possible using YaST2 . After completing the configuration of the network interfaces, the VM must be rebooted.
5. Expand the AVE installation wizard with the ./usr/local/avamar/src/ command . Then we launch it. The installation wizard has a convenient web-based interface: Avamar_Server_IP : 8543 / avi / avigui.html.

The installation and configuration process of Avamar will take a long time and require user participation:

8. After the installation is completed in the web interface, we execute the final script in the console

9. At the end of the server installation, we need to download and install the management console. To do this, just open the browser IP address of the server (which you set in step 4) in a browser. Using the web-interface, you can download the client for various operating systems, as well as the baremetal utility for Windows. There is also all the necessary documentation for everything that Avamar can do.


This is what the administrator console for Avamar installed looks like:

Adding Clients

The administrator is offered several scenarios for activating (adding) clients. In any of them, it is initially required to install the client on the protected machine.

After this, the following scenarios are possible:
1. The user himself can send an activation request through the client:

2. The administrator can activate the clients “by the piece”, sending them a command to activate by indicating his IP address.
3. The administrator can activate the client pool by specifying the pool of IP addresses.

In the first scenario, the client will be automatically added to the root domain, and a default policy will be applied to it. In the second and third scenarios, the administrator can predefine the domain and policy for the added clients.


The concept of centralized management includes the integration into hierarchical "domains" (not to be confused with Windows AD). In fact, it’s just a tree-like folder structure that includes a list of clients, their copying policies, users and their delegated rights.

Thus, you can combine protected devices into different groups with your copying policies and restrict access for different users to them.

Copy policies define all backup properties:


Maintenance window

The maintenance window includes creating Checkpoints, checking Checkpoints, garbage collection.
A little about Checkpoints - this is a backup of the server itself. If the host \ virtual machine crashes with these Checkpoints in the future, it will be possible to restore the managing server without losing data, saved clients, users, domains and policies.

Copy Objects

Copy Retention Time

You can schedule a systematic backup at any time of any day, choose a storage, etc. There is support for most operating systems, which allows you to select specific directories for copying. This allows you to protect only the necessary data, reducing the cost of storing unnecessary copies.

In addition, when creating a policy, you can specify the expiration date for the copy. After this period, the copy will be considered “expired” and removed from the repository.

You have a root user with access to the root domain, i.e. to all customers.
Then you divide all users into departments - sales, marketing, warehouse, logistics, etc.
For each domain (department), you can assign a user to manage copy policies. And you can manage the copying of all departments personally.
At the same time, users have different access levels: viewing status, only copying, only recovery, full access. Those. the department head may be entitled to start an unplanned backup, and the rest of the employees only to restore from existing copies.

To manage the backup process, the backup administrator must install an administrator console for himself.
Users can only view and restore only their data.
To do this, use the web interface that can be opened through the client installed on the device.

Integration with VMware

Avamar allows you to copy virtual machines running on the VMware vSphere ESXi hypervisor. For this, there is no need to install clients on each VM. It is enough to deploy a virtual machine in a cluster, which will act as a client for the entire hypervisor. It is called the Avamar proxy, and for backup uses the VADP (vStorage APIs for Data Protection) technology. To integrate Avamar with ESXi, you must have vCenter.
Installation process
The Avamar proxy distribution is accessible through the web interface and is a .ova file.
After deploying .ova and loading the system, a script starts immediately to configure the network interfaces of the virtual machine.
Then, on the VM with the Avamar server, in the /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml file, write TRUE in the line: entry key = "Allow_duplicate_client_names" value = "true"

For those who want to copy only VMware virtual machines, you can purchase the VDP-A product - vSphere Data Protection Advanced. This product is also based on VADP technology. In essence, VDP-A is the Avamar Virtual Edition, "stripped down" and ground under VMware. It costs less than EMC AVE, since it retains only the functionality for backing up virtual machines and some Microsoft applications.

Integration with Data Domain using the example of Data Domain 160

When integrating Avamar with DD, DD, it is used only for storing the backups themselves, all metadata and caches remain on the Avamar server. The data will be transmitted using the Boost protocol.
First you need to create a user in DD for Avamar:

Next, in the Avamar Administrator, in the Server tab, select Server Management and click on the button Actions> Add Data Domain System.

We fill in all the proposed fields, to check the number of supported streams, click Get Stream Info .

Go to the SNMP tab in the same window. Under this protocol, the Avamar server will receive information about the status of DD.

As a result, we see the added system:

Next, we must not forget to check the “Store backups on DD” checkbox in the DataSet settings


The main advantages of the product for backing up and restoring EMC Avamar data are:

1. Versatility - support for most commercial operating systems, databases and services.
2. Flexibility - various configurations, including a virtual version for VMware vSphere.
3. Easy to use - convenient client interface and integration with VMware vSphere.
4. Profitability - tangible savings on bandwidth requirements for the data channel through the use of deduplication on the client.

Thus, having a staff of more than 50 people, we can simply backup their devices.
In most cases, with a virtualized infrastructure, Avamar VE (virtual Appliance) and enough storage space for the copy will be enough.

This will allow centralized automatic backups without any technical difficulties and costs for qualified employees.

And the procedure for recovering lost user data will be reduced to replacing the same type of workstation and pressing a few buttons in the administrative console.

Also popular now: