Archive this: how file system archiving works with Commvault


    Previously, the so-called long-term archiving was more common, when files that had to be stored under the law for several years were dumped onto tapes, tapes were taken away to a special storage if necessary. On holidays, for example, on the occasion of an audit, cassettes came to the office and the necessary files were delivered from them. With the availability of disk storages, it became possible to organize archiving not only for super important accounting or legal documents, but also for simple mortal files, which seem to be impossible to delete (suddenly come in handy), but you do not want to spend space on the fast storage for them.

    Such archiving usually works as follows: special archiving rules (the date of the last opening, editing, creation) are registered, and all files that fall under these rules are automatically transferred from the productive storage to the archive on slower disks.

    Today I just want to talk about this archiving option using the example of the Commvault solution.

    And immediately disclaimer: archiving is not equal to backup


    As you might have guessed, the main profit from archiving is to save storage space. Quarterly reports that are needed only during the audit, photographs from the year before last New Year's corporate party - in general, everything that is not needed is transferred to the archive, and does not lie on the main storage ballast. Since as a result there are fewer files, the volume of backup copies from the product will be reduced, which means that there is less need for backup space.
    As a rule, archiving licenses are cheaper than backups.

    Example: conditional backup licensecosts $ 100 for 1 TB, and for archiving - 70. The client has a server with 5 TB of data, which he fully backs up and pays $ 500 per month for this. After he decided to drop 4 TB into the archive, 1 TB remained for backup, i.e. 100 dollars a month. For the archive, he pays 4 TB x $ 70 = $ 280. As a result, instead of the initial $ 500, the client pays 380, saving 120. Multiplying by 12, it turns out $ 1440 less annually.

    You can go further and add here the cost of free space on the product due to those who moved to the archive, as well as savings due to deduplication, which also works in archiving. So many people are so happy that there is a bright thought: why not replace the more expensive backup with archiving. And here the problems begin.

    Archiving is not equal to backup (which is not a backup yet, read here ). It differs from backup in that it does not support versioning: in what form the file got into the archive, in that it will be there. The second point: if something happens with the archive storage, then without a backup or copy to the second site, the fate of the archive will be deplorable.
    In fact, they solve two different tasks: archiving - optimizing space on a productive storage, backup - protection against data loss.

    What Commvault has for archiving


    At Commvault, the same agent as backup is responsible for archiving - OnePass . As part of one task, part of the data goes to backup, the other, which falls under the archiving rules, is archived. Therefore, if you already backup data using Commvault and decide to get acquainted with archiving, then you do not need to install any additional agents.

    OnePass works as follows:

    1. If there are already full and incremental backups of files, it is recommended to make a synthetic backup (synthetic full backup). In this case, the backup file will be assembled from the last full backup and all subsequent incremental and / or differential copies. The resources of the source server will not be involved.

    2. After the backup is completed, OnePass determines the files that fall under the archiving rules and transfers them to the archive (the allocated storage space or a separate archive storage is how you decide).

    The criteria by which OnePass decides to send the file to the archive are as follows:

    • when to start deleting files in the archive (depending on the availability of free space on the disk);
    • When was the last time you opened the file?
    • when was the last time you edited the file;
    • file creation time;
    • file size.


    Actually, everything is set up here.

    3. Files defined in the archive are either deleted from the product at all, or replaced with some kind of shortcuts (stubs).

    In the second case, little will change for the end user. If the accountant Marya Ivanovna needs to show the auditor a report of five years ago, then she just clicks on the shortcut, the file again moves to the productive and opens as usual. Small files will be restored from the archive quickly: a Word file is less than MB - a few seconds. If this is some video, then more time will be required.

    On the product, the disturbed file will remain until it again falls under the archiving policy. Until this moment, he will leave on a backup task.

    Files with crosses are the same shortcuts.

    As with backup, the administrator has the ability to limit the number of threads for recovery (throttling) so that the system does not lie under a large number of requests. You can set the settings for the number of files for simultaneous recovery, set intervals between restores, etc.

    Files that are sent to the archive can be encrypted and stored in this form.
    After transferring to archive storage, the adventures of unclaimed files do not end there. For archive storage, you can also configure rules by which the archives themselves will be deleted (retention policy) over time. For example, reports rewound legitimate three years in the archive, and then automatically deleted.



    On a test disk with copies of HR documents, videos and photos from all kinds of corporate parties, I tried to apply the following archiving rules: files older than 0 days that have not changed for more than 7 days, larger than 1 MB. It turned out the following: before archiving, the data volume on the product is 391 GB, after - only 1 GB.





    How to understand what to give to the archive


    To determine which values ​​to register for each parameter, OnePass has the System Discovery and Archive Analyzer Tool (available for Commvault users). It will scan files by the time of the last change, opening and creation, as well as by their size. Further, all these raw statistics can be sent to Commvault and get beautiful graphs and charts, from which you can clearly see which archiving rules are better to prescribe. Not the most convenient scheme, I admit, but it will be clear in which direction to dig.


    The graph shows statistics on how old the changes to the files are. Screenshot from Commvault documentation.


    And here are statistics on the date the file was last opened. Screenshot from Commvault documentation.

    Reports are also made on file sizes and their format. But the most important is the File Level Analytics Report. He will offer archival rules, and also show how much space you can save if you use these rules.

    The report promises that if all files larger than 10 MB that are not changed for more than 90 days are sent to the archive, 3.85 TB will be saved. You should not look at the calculation of savings in money: for some reason, the cost of 1 GB on a disk is estimated by them at 10 bucks.

    Only registered users can participate in the survey. Please come in.

    Would you use such archiving in your own practice?


    Also popular now: