Implementation of mp3 file filter for 7-Zip archiver

For a long time, my only archiver is the 7-Zip program. I like its compression ratio and speed, so I use it almost everywhere: to compress software distributions, to archive collections of pictures, and also to store music releases. I download music from online non-profit labels, where it is most often delivered in the form of an archive (zip or rar) containing compositions and cover art. After downloading, the archives are compressed by 7-Zip and stored as such on the hard disk. To listen to music, archives do not need to be unpacked, as modern players can play music directly from them (in particular, I use Foobar2000). And although the total amount of memory occupied by all archives is still far from the capacity of the hard drive, I began to take up the thought of improving the compression ratio of mp3 files. As one of my teachers said, the task is an unsatisfied feeling of anxiety; and that was exactly the feeling I had. I did not want to invent my transcoder, so it was decided to try to write a filter that removes any redundant information.


Analysis of the conditions of the problem


By mp3 we mean MPEG-1/2 / 2.5 Layer 3. Let's see what an mp3 file is.
So, an mp3 file consists of many frames, and frames can be connected with each other, that is, to decode one frame, you may need to look at another. The file may contain tags, both before and after the audio data. Each frame stores 1152 samples (for stereo mode) and has a duration of 26 ms. The frame size depends on the used bitrate and sampling frequency, and is calculated by the following formula:


Padding is a special flag in the frame header meaning that the frame is supplemented with one byte.

The frame consists of a header, a verification code (optional), information necessary for decoding the samples (we will call this information “side info”, since I was unable to come up with an acceptable Russian translation of this phrase), actually the packed samples themselves and additional data that do not have directly related to encoded samples: A



32-bit header has the following structure:
TitleField lengthAppointment
Sync wordelevenUsed to search for a title. All 11 bits are set to "1"
MPEG version2
Layer Index2
Bit of protection1If not set, then the frame contains CRC
Bitrate Index4
Sample Rate Index2
Padding bit1The same flag signaling that the frame is supplemented with one byte
Private bit1Carries any specific information.
Channel coding mode2May be mono, stereo, joint stereo, dual channel
Mode extension2Used in joint stereo channel encoding mode
Copyright bit1If this bit is set, it means that copying the file is illegal
Original bit1If this bit is set, it means that the file is on its original media
Accent (Emphasis)2Tells the decoder that additional audio processing is required


In practice, some header fields are not used and / or remain the same throughout the mp3 file. For example, MPEG version, layer index, private bit, copyright bit, original bit, etc. If the file was created using constant bitrate (CBR), then the bitrate index field does not change.
The first thought is that it is not necessary to store all the header fields of the frame, but only those that change throughout the file. The rest, not changing fields, can be saved only once. Going through the headers of all frames, you can collect statistics and identify the fields that are actually used.

A CRC verification code may be located right behind the header. It is a 16-bit integer calculated using the CRC16 algorithm for the last 16 bits of the frame header and for all side info bytes.
The second thought is that since we know how to calculate the verification code, then we can not store it. Of course, we just have no right to just take and throw CRC out of the frame. First you need to make sure that it represents the correct value, because an error could crept into the verification code itself or the bytes by which it is calculated.

I will not dwell on the anatomy of side info in detail. Side info is a rather complex structure with many fields that change in each frame. To crank out the same trick with side info fields as with frame header fields it won't work.

The side info is followed by Huffman code-compressed audio. In fact, this is not entirely true, since Huffman codes alternate with scale factors, but for simplicity we will assume that after the side info and to the end of the frame there is audio data.

Any other data may be between the frames, if only there are no frame header signatures (Sync word) among this data.

Algorithm


Having studied the source code of 7-Zip, I came across an implementation of the BCJ2 filter, designed to improve the compression of executable files (* .exe, * .dll). A distinctive feature of this filter is that it has one input and four outputs. Each of the outputs has its own encoder. Thus, it turns out that the input file is divided into 4 streams and each stream is compressed with its own separate encoder.

I liked the idea of ​​several exits and I decided to apply it in my filter:
- pack the header fields of the frames in one stream in accordance with the first thought
- pack the side info in the second thread
- and finally pack the remaining information in the third stream: Huffman tags and codes

The first two streams are compressed using the LZMA algorithm with the settings used inside 7-Zip to compress the archive headers, and the third stream is compressed by the algorithm selected by users in the "Add to archive" window.

The encoder works like this: all the frames of the file are searched sequentially. The frame is divided into three parts: side info is sent to the side info stream, Huffman codes are sent to the main output stream, and the header fields are stored in a buffer. If a CRC is present in the frame and it turns out to be correct, then it is skipped, otherwise its value is transferred to the header stream so that it can be restored when unpacking.
The sync word (11 bits) of each header is simply thrown away. Values ​​of fields that did not change during the file are written to the header stream only once, while changing fields are written sequentially one after another.

The decoder works as a mirror to the encoder: first, all headers are restored, then the restoration of each frame begins. It is calculated or restored from the first CRC stream, followed by side info and the remainder of the frame.

Implementation and practical results


Implementation is not particularly complicated and therefore I do not see the point of bringing it here. Sources posted on github . I only note that the support for such filters (with one input and several outputs) in 7-Zip is not transparent enough and I had to go inside the 7-Zip with soap, in particular, in the 7zUpdate.cpp file. Yes, by the way, when developing the filter, the sources of 7-Zip 9.20 were used.
The compiled dll lies on the same github . She needs to replace 7z.dll in the folder where you have 7-Zip installed.

Testing of the work was carried out on a sample of 1351 files with a total volume of 16755839778 bytes (15.605 GB), on a computer running Windows XP, an Intel Core 2 6700 processor (2.66 GHz) and three gigabytes of RAM. Each file was individually compressed into an archive and unpacked. First, without using a filter, then with a filter. Everywhere the Ultra compression level was used.

 Without a filterWith filter
Total amount of compressed data16300442691 bytes (15.181 GB)16064243440 bytes (14.961 GB)
Total time spent on packaging12181.09 sec11273.47 seconds
Total time spent unpacking3215.44 sec2744.76 sec


The filter allowed to save:
- 236199251 bytes (~ 225 Mb) of memory on the hard drive
- 908 seconds (~ 15 minutes) on the package
- 407 seconds (~ 8 minutes) on the unpacking

As you can see, the packaging speed increased by 8%, and the unpacking speed on as much as 15%.

Conclusion


The filter improved the compression ratio of mp3 files by 1.5%. This result cannot be called outstanding, but also bad, because the filter works with compressed data.
That's all. Thanks for attention.

Also popular now: