Writing a ZLib-based Archiver in .NET
Why write
- because it’s convenient to have your own custom tool in which you can interfere with archiving at any stage
- because it's interesting
- because many archivers with api are paid, but for others, see the first argument.
Technology and Libraries
You will need the zlib.net.dll library ( official site ).
Visual Studio 2010 Development Environment
C #
Framework 3.5 Language
Technical task
The archiver should be able to:
- compress files and directories
- compile archive without compression
- encrypt data (with and without compression)
- exclude specified paths
- delete files after they are compressed
- unpack compressed archive
Design
Archive format
Through optimization, I came to the following option:
Appointment | The size |
Archive type | 1 byte |
Header length (after compression and encryption) | 4 bytes |
Title (we will consider in more detail below) | N bytes |
Content block of the first file | N bytes |
Content block of the second file | N bytes |
...... | ...... |
Content Block of Kth File | N bytes |
Archive Header Format
Appointment | The size |
Raw Header Size | 4 bytes |
Block 1 | N bytes |
Block 2 | N bytes |
...... | ...... |
Block K | N bytes |
Archive Header Block Format
Appointment | The size |
Block size | 4 bytes |
Absolute path length | 4 bytes |
Absolute path | N bytes |
Relative path length | 4 bytes |
Relative path | N bytes |
The size of the object after processing | 8 bytes |
A little explanation. At the beginning of the archive file, a header is stored that collects all the metadata for the archive objects. The header itself goes through the same compression and encryption stages as the archive files. After the header are blocks storing the contents of the files after processing, the blocks go right up to the mark. The definition of block boundaries follows from the header, in which the sizes of the blocks are stored.
General principles of work
The user sets compression options, on the basis of which the necessary file handlers (archiver, encoder) are connected, each such handler contains two methods, Execute and BackExecute. When archiving, we call the Execute method, when unzipping the BackExecute method, and when unzipping we use the handlers in the reverse order. This structure makes it extremely easy to supplement the program with any number of new handlers (for example, implementing other encryption or compression methods).
Work algorithm
- Archive type detection (compressed, encrypted)
- Reading a list of archiving objects
- Formation of a complete list of archived objects based on the read list and the list of exceptions
- Creating an archive header (in object view)
- Enumerating the complete list of objects in the header
- Processing the object, updating data on its size after processing in the header, writing to the temporary file of the processed content.
- Saving Header to File
- Header processing (compression, encryption)
- Build the resulting archive file
Implementation
ZLib can compress / decompress the data transferred to it as an array of bytes. Actually this is all we need and all that we will use. He does not know how to encrypt data, for this we use the standard .NET Framework library - System.Security.Cryptography.
In the process of archiving / unzipping, you can get data on the current object being processed, as well as errors that have occurred.
If an error occurs while processing the file, the user is offered a choice of 4 actions:
- abort
- ignore error
- ignore all errors
- repeat
The action request can be canceled by simply commenting out the ErrorProcessing event, in which case the program execution will be interrupted.
I will not give the program code, I give a link to the sources.
Directly:
Project
In the form of dll'ki
SVN:
svn: //svn.code.sf.net/p/yark/code-0/trunk
Project:
sourceforge.net/projects/yark
And an example of use:
Compression
ArchiveProvider compressor = new ArchiveProvider();
using (SaveFileDialog sfd = new SaveFileDialog())
{
if (sfd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
CompressorOption option = new CompressorOption()
{
Password = пароль_если_зашифровать,
WithoutCompress = true_если_без_сжатия,
RemoveSource = true_если_удалять_исходные_файлы,
Output = sfd.FileName
};
//Списки файлов и каталогов для сжатия
foreach (string line in lbIncludes.Items)
option.IncludePath.Add(line);
//Списки файлов и каталогов для исключения
foreach (string line in lbExclude.Items)
option.ExcludePath.Add(line);
compressor.Compress(option);
}
}
Unzipping
ArchiveProvider decompressor = new ArchiveProvider();
using (FolderBrowserDialog fbd = new FolderBrowserDialog())
{
if (fbd.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
decompressor.Decompress(путь_к_архиву, fbd.SelectedPath, пароль_если_зашифрован);
}
}
Comparison of the result of work
In time, the result did not begin to be detected, approximately the same.
Initial data:
- directory with text files (1 430 Kb)
- mixed data catalog (18 893 Kb)
Text | Mixed data | |
Winrar | 613 | 8 045 |
Zip | 638 | 8 709 |
This | 588 | 8 655 |
For rar and zip format, the usual compression parameter was set, which is also used in the program.
The current archive format stores absolute paths of files and directories, you can exclude them and slightly improve compression.
Possible improvement
- saving information about the file (date of creation / change, access rights)
- add multithreading (just parallelize the creation of temporary files)
- add comments to the archive
- associate files with the program