Ogoun November 29, 2011 at 10:29

Writing a ZLib-based Archiver in .NET

Why write

because it’s convenient to have your own custom tool in which you can interfere with archiving at any stage
because it's interesting
because many archivers with api are paid, but for others, see the first argument.

Technology and Libraries

You will need the zlib.net.dll library ( official site ).
Visual Studio 2010 Development Environment
C #
Framework 3.5 Language

Technical task

The archiver should be able to:

compress files and directories
compile archive without compression
encrypt data (with and without compression)
exclude specified paths
delete files after they are compressed
unpack compressed archive

Design

Archive format

Through optimization, I came to the following option:

Appointment	The size
Archive type	1 byte
Header length (after compression and encryption)	4 bytes
Title (we will consider in more detail below)	N bytes
Content block of the first file	N bytes
Content block of the second file	N bytes
......	......
Content Block of Kth File	N bytes

Archive Header Format

Appointment	The size
Raw Header Size	4 bytes
Block 1	N bytes
Block 2	N bytes
......	......
Block K	N bytes

Archive Header Block Format

Appointment	The size
Block size	4 bytes
Absolute path length	4 bytes
Absolute path	N bytes
Relative path length	4 bytes
Relative path	N bytes
The size of the object after processing	8 bytes

A little explanation. At the beginning of the archive file, a header is stored that collects all the metadata for the archive objects. The header itself goes through the same compression and encryption stages as the archive files. After the header are blocks storing the contents of the files after processing, the blocks go right up to the mark. The definition of block boundaries follows from the header, in which the sizes of the blocks are stored.

General principles of work

The user sets compression options, on the basis of which the necessary file handlers (archiver, encoder) are connected, each such handler contains two methods, Execute and BackExecute. When archiving, we call the Execute method, when unzipping the BackExecute method, and when unzipping we use the handlers in the reverse order. This structure makes it extremely easy to supplement the program with any number of new handlers (for example, implementing other encryption or compression methods).

Work algorithm

Archive type detection (compressed, encrypted)
Reading a list of archiving objects
Formation of a complete list of archived objects based on the read list and the list of exceptions
Creating an archive header (in object view)
Enumerating the complete list of objects in the header
Processing the object, updating data on its size after processing in the header, writing to the temporary file of the processed content.
Saving Header to File
Header processing (compression, encryption)
Build the resulting archive file

Implementation

ZLib can compress / decompress the data transferred to it as an array of bytes. Actually this is all we need and all that we will use. He does not know how to encrypt data, for this we use the standard .NET Framework library - System.Security.Cryptography.
In the process of archiving / unzipping, you can get data on the current object being processed, as well as errors that have occurred.
If an error occurs while processing the file, the user is offered a choice of 4 actions:

abort
ignore error
ignore all errors
repeat

The action request can be canceled by simply commenting out the ErrorProcessing event, in which case the program execution will be interrupted.
I will not give the program code, I give a link to the sources.

Directly:
Project
In the form of dll'ki

SVN:
svn: //svn.code.sf.net/p/yark/code-0/trunk

Project:
sourceforge.net/projects/yark

And an example of use:

Compression


ArchiveProvider compressor = new ArchiveProvider();
using (SaveFileDialog sfd = new SaveFileDialog())
{
    if (sfd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
    {
        CompressorOption option = new CompressorOption()
        {
            Password = пароль_если_зашифровать,
            WithoutCompress = true_если_без_сжатия,
            RemoveSource = true_если_удалять_исходные_файлы,
            Output = sfd.FileName
        };
        //Списки файлов и каталогов для сжатия
        foreach (string line in lbIncludes.Items)
            option.IncludePath.Add(line);
        //Списки файлов и каталогов для исключения
        foreach (string line in lbExclude.Items)
            option.ExcludePath.Add(line);
        compressor.Compress(option);
    }
}

Unzipping


ArchiveProvider decompressor = new ArchiveProvider();
using (FolderBrowserDialog fbd = new FolderBrowserDialog())
{
    if (fbd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
    {
        decompressor.Decompress(путь_к_архиву, fbd.SelectedPath, пароль_если_зашифрован);
    }
}

Comparison of the result of work

In time, the result did not begin to be detected, approximately the same.
Initial data:

directory with text files (1 430 Kb)
mixed data catalog (18 893 Kb)

	Text	Mixed data
Winrar	613	8 045
Zip	638	8 709
This	588	8 655

For rar and zip format, the usual compression parameter was set, which is also used in the program.
The current archive format stores absolute paths of files and directories, you can exclude them and slightly improve compression.

Possible improvement

saving information about the file (date of creation / change, access rights)
add multithreading (just parallelize the creation of temporary files)
add comments to the archive
associate files with the program

Tags:

Writing a ZLib-based Archiver in .NET

Why write

Technology and Libraries

Technical task

Design

Archive format

Appointment

The size

Appointment

The size

Appointment

The size

General principles of work

Work algorithm

Implementation

Compression

Unzipping

Comparison of the result of work

Text

Mixed data

Winrar

Zip

This

Possible improvement

Also popular now: