ZIP Compression

    Hello Habr!
    This article is devoted to how to properly and as much as possible compress files into ZIP archives. I decided to write this article for the reason that so many applications pack their formats in ZIP. This article will cover ZIP compression methods, ZIP compression applications, and how compression can be improved.

    ZIP compression method


    To begin with, I suggest that ZIP supports different compression methods (Copy, Deflate, Deflate64, BZip2, LZMA, PPMd), but we will consider only one compression method - Deflate , for the reason that most applications use this method, which pack their formats in zip. Here is a short list of file formats that are actually ZIP archives - open-file.ru (enter the ASCII header descriptor in the search - PK). Immediately make a reservation, this is only a small list of files.

    Deflate Compression Method


    Today there are several libraries based on the Deflate compression method:
    Deflate LibraryWork speedCompression ratioApplications
    ZlibHighLow
    7-zipAverageAverage7-zip , advzip
    KzipLowHighkzip
    So before choosing an archiver for ZIP, you need to understand what result we need, and how much time we are willing to spend to get it. Deflate is characterized by the higher the compression ratio, the more time it will take.

    image

    ZIP archivers


    In this section, we will consider only those applications that are free to use.

    7-zip algorithm

    Here we will talk about two programs where the 7-zip algorithm is implemented: 7-zip and advzip.
    When creating a zip archive using 7-zip, I use the following options

    -r -mm=Deflate -y -tzip -mpass=15 -mfb=258 -mx9

    The peculiarity of advzip is that it already works with ready-made zip archives, i.e. you just specify the path to the archive, and he himself tries to compress it. It is convenient when you already have a ready-made archive, and you do not need to unpack and archive again.

    Kzip algorithm

    The kzip algorithm was implemented in the kzip application, the application runs extremely slowly, but almost always gives the best result. It has settings (/ s, / n, / b) that can improve / worsen the ZIP compression ratio.

    Recommendations


    Here I wanted to give some recommendations on how to get the best compression ratio (recommendations are based on personal experience):
    • If you archive files and there are ZIP archives there, I recommend unpacking these archives (for convenience, you can use advzip with the / z0 option). This is because the Deflate method does not support continuous archives , i.e. it turns out that when the Deflate method tries to compress the unzipped archive, the unzipped archive in this case appears as one whole file and its contents are compressed as a continuous archive.
    • If you want to get the maximum compression effect, you can use the zipmix application . Suppose you created two zip archives of the same content using kzip, but with different settings, and as a result received archives of different sizes. But this does not mean that all the files that you compressed in the first archive are individually compressed in a smaller size than in the second archive. For these purposes, zipmix is ​​needed, it creates a third archive from two archives, with a smaller size, because he compares each file individually, and selects the option where the size is smaller. zipmix does not only work with kzip archives.

    Practice


    And so I decided to show how it all works. For example, I took the game for the iPad - Angry Birds HD version 2.0.0. The original size of the game is 13,547,363 bytes.
    ApplicationsResult, byteElapsed time, second
    advzip12 891 768195
    7-zip12 891 143720
    kzip12 877 7942770
    7-zip + advzip12 858 419 -
    kzip + advzip12 849 101 -
    kzip + 7-zip + advzip12 842 760 -

    As you can see zipmix can slightly improve the compression ratio. Personally, when I need to get the maximum, I simply combine all three (kzip + advzip + 7-zip) results into one. This is much better than trying to iterate over parameters in kzip.

    Also popular now: