Writing a ZLib-based Archiver in .NET


    Why write


    • because it’s convenient to have your own custom tool in which you can interfere with archiving at any stage
    • because it's interesting
    • because many archivers with api are paid, but for others, see the first argument.


    Technology and Libraries


    You will need the zlib.net.dll library ( official site ).
    Visual Studio 2010 Development Environment
    C #
    Framework 3.5 Language

    Technical task


    The archiver should be able to:
    • compress files and directories
    • compile archive without compression
    • encrypt data (with and without compression)
    • exclude specified paths
    • delete files after they are compressed
    • unpack compressed archive


    Design


    Archive format

    Through optimization, I came to the following option:
    Appointment
    The size
    Archive type1 byte
    Header length (after compression and encryption)4 bytes
    Title (we will consider in more detail below)N bytes
    Content block of the first fileN bytes
    Content block of the second fileN bytes
    ............
    Content Block of Kth FileN bytes

    Archive Header Format
    Appointment
    The size
    Raw Header Size4 bytes
    Block 1N bytes
    Block 2N bytes
    ............
    Block KN bytes

    Archive Header Block Format
    Appointment
    The size
    Block size4 bytes
    Absolute path length4 bytes
    Absolute pathN bytes
    Relative path length4 bytes
    Relative pathN bytes
    The size of the object after processing8 bytes


    A little explanation. At the beginning of the archive file, a header is stored that collects all the metadata for the archive objects. The header itself goes through the same compression and encryption stages as the archive files. After the header are blocks storing the contents of the files after processing, the blocks go right up to the mark. The definition of block boundaries follows from the header, in which the sizes of the blocks are stored.

    General principles of work


    The user sets compression options, on the basis of which the necessary file handlers (archiver, encoder) are connected, each such handler contains two methods, Execute and BackExecute. When archiving, we call the Execute method, when unzipping the BackExecute method, and when unzipping we use the handlers in the reverse order. This structure makes it extremely easy to supplement the program with any number of new handlers (for example, implementing other encryption or compression methods).

    Work algorithm


    1. Archive type detection (compressed, encrypted)
    2. Reading a list of archiving objects
    3. Formation of a complete list of archived objects based on the read list and the list of exceptions
    4. Creating an archive header (in object view)
    5. Enumerating the complete list of objects in the header
    6. Processing the object, updating data on its size after processing in the header, writing to the temporary file of the processed content.
    7. Saving Header to File
    8. Header processing (compression, encryption)
    9. Build the resulting archive file


    Implementation


    ZLib can compress / decompress the data transferred to it as an array of bytes. Actually this is all we need and all that we will use. He does not know how to encrypt data, for this we use the standard .NET Framework library - System.Security.Cryptography.
    In the process of archiving / unzipping, you can get data on the current object being processed, as well as errors that have occurred.
    If an error occurs while processing the file, the user is offered a choice of 4 actions:
    • abort
    • ignore error
    • ignore all errors
    • repeat

    The action request can be canceled by simply commenting out the ErrorProcessing event, in which case the program execution will be interrupted.
    I will not give the program code, I give a link to the sources.

    Directly:
    Project
    In the form of dll'ki

    SVN:
    svn: //svn.code.sf.net/p/yark/code-0/trunk

    Project:
    sourceforge.net/projects/yark

    And an example of use:
    Compression

    
    ArchiveProvider compressor = new ArchiveProvider();
    using (SaveFileDialog sfd = new SaveFileDialog())
    {
        if (sfd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
        {
            CompressorOption option = new CompressorOption()
            {
                Password = пароль_если_зашифровать,
                WithoutCompress = true_если_без_сжатия,
                RemoveSource = true_если_удалять_исходные_файлы,
                Output = sfd.FileName
            };
            //Списки файлов и каталогов для сжатия
            foreach (string line in lbIncludes.Items)
                option.IncludePath.Add(line);
            //Списки файлов и каталогов для исключения
            foreach (string line in lbExclude.Items)
                option.ExcludePath.Add(line);
            compressor.Compress(option);
        }
    }
    

    Unzipping

    
    ArchiveProvider decompressor = new ArchiveProvider();
    using (FolderBrowserDialog fbd = new FolderBrowserDialog())
    {
        if (fbd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
        {
            decompressor.Decompress(путь_к_архиву, fbd.SelectedPath, пароль_если_зашифрован);
        }
    }
    


    Comparison of the result of work


    In time, the result did not begin to be detected, approximately the same.
    Initial data:
    • directory with text files (1 430 Kb)
    • mixed data catalog (18 893 Kb)


    Text
    Mixed data
    Winrar
    6138 045
    Zip
    6388 709
    This
    5888 655


    For rar and zip format, the usual compression parameter was set, which is also used in the program.
    The current archive format stores absolute paths of files and directories, you can exclude them and slightly improve compression.

    Possible improvement


    • saving information about the file (date of creation / change, access rights)
    • add multithreading (just parallelize the creation of temporary files)
    • add comments to the archive
    • associate files with the program

    Also popular now: