Adding exdupe.exe - nimble deduplicating archiver

Some time ago I ran into an unpleasant problem - I needed to backup several virtual machines. I must say that backup for me means not only the archive with the latest copy, but a small bunch of these archives made according to the given scheme. The batch file for archiving, of course, was written quickly and worked flawlessly, but the size ... The size of the backup set was huge. It was especially sad that they were almost the same virtual machines, and almost the same backups of these virtual machines. So I recognized the words “deduplication” and “diff” and started looking for some kind of deduplication compression utility.

Different utilities offered different approaches to compressing files that are similar in content, but one thing turned out to be common - you select one source file and set the utility on the others - it determines the difference between the source and the rest and archives the result, and when you need to deploy it, you specify the source file and the archive with diff , the utility will deploy everything.
In short, the source had to be deployed somewhere. All the time - both at the time of archiving, and at the time of unzipping. That is, "oil painting" - I have five virtual machines, I want to make an archive today with the difference between today's virtual computers and yesterday's - I have to:
- prepare a full copy of the whole farm from yesterday,
- then run the utility, it will make a diff and a diff archive.
Now I want to copy all this somewhere - I won’t drag yesterday’s data (source) in uncompressed form — I’ll have to archive the source too. If tomorrow I need to make a diff between yesterday and tomorrow - the source must be available in an uncompressed and untouched form - either a copy of yesterday’s state, or an archive that will have to be deployed. If I need to deploy the archive on a new host - first I need to deploy the source, then deploy the diff itself.
Well, disk space - you can buy, but time! Such a lot of time is spent on unzipping the source! But he was found - an archiver who knew how to do everything right - drive the source into the archive, with deduplication, and then make diffs directly from the compressed source, and the speed of archiving / unzipping rests on the speed of the hard drive. Cool! But under Windows 2003 does not work. As you know, if everything worked by itself, I would not write this article.

So, now - the ambulance.
The archiver is called exdupe; it was a type of freeware, with partially accessible source code. Partly - because the deduplicator library was linked statically, and the code for the command line utility was laid out (now all the code was laid out). Everything lies in the form of a project under Visual Studio 2012. Everything was launched only under the 64-bit version of Windows (I have Win2003), and when starting it produced an error:
Entry Point Not Found
The procedure entry point VssFreeSnapshotPropertiesInternal could not be located in the dynamic link library VSSAPI.DLL.
image
The source code was immediately downloaded from the program’s website (I was finishing version 0.5.0).
The cause of the error is the incompatibility of the versions of the VSSAPI.dll library for my Windows and for the one that was connected in the project.
After digging through the source, I realized that the easiest way - just turn off Shadow Copy support - remove calls to the VSS library and root functions that are responsible for accessing VSS. I must say, the code was written with direct hands, albeit with errors, and there were only two functions and they were in the file "shadow \ shadow.cpp".
Here is what we do:
  1. we find the function void unshadow (void)
  2. comment on line 342:
    	  VssFreeSnapshotProperties(&prop.Obj.Snap);
    

    will become:
    	  //removing shadowing
    	  //VssFreeSnapshotProperties(&prop.Obj.Snap);
    
  3. before line 330
     ULONG fetched = 0;
    


    add:
    //remove shadowing
    return 1;
    

  4. we find the function int shadow (vector volumes),
    comment line 177:
      hr = ::CreateVssBackupComponents(&comp);
    

    will become:
      //remove shadowing
      //hr = ::CreateVssBackupComponents(&comp);
    


    5. before line 157:

      // Initialize COM and open ourselves wide for callbacks by
      // CoInitializeSecurity.
      HRESULT hr;
    


    stick in:
    //remove shadowing
    return 1;
    


  5. Before assembly, I recommend changing the name of the executable in the project properties so as not to confuse it with the author's “exdupe.exe”. Type "exdupe-050-noshadow.exe".
    We collect, run - it works!
    The algorithm is really smart, of course it eats up the memory and loads the kernels, but this can be configured - I am comfortable running in two threads with the "-t2" key. On a regular hard disk, source deduplication with compression:
    - 65.7 GB in 44 min = 24.8 MB / s
    - compressed to 24.1 GB = compression ratio 0.37
    deduplication the next day
    - processing time 10 min. 20 sec = 106 MB / s
    - diff compressed to 2.1 GB

    I’ll say separately that the restriction on launching only from under 64-bit versions of the OS is artificially introduced and can be disabled, while archives created by a 64-bit program normally expand to 32-bit, although there are no record speeds here (and there’s no reason to wait necessary).
    I don’t upload the binaries and full texts of the project due to license restrictions.

    The exdupe utility site is www.exdupe.com
    Sources: www.exdupe.com/old

Also popular now: