Audiophile May 3, 2011 at 14:22

Lossy audio encoding. What is what?

Attention : this is an old version of the article, a new one is available on my website .

The evolution of audio coding

In the courtyard of 2011, 17 years have passed since the appearance of the first MP3 encoder. But the fact that most of us are still listening to MP3 music quietly does not mean at all that progress has been marking the spot. And this applies not only to the development of the MP3 encoding algorithm, but also to the evolution of lossy audio encoding in general - in the form of new, more advanced codecs that really allow you to get better quality with a smaller size. Formats such as OGG Vorbis, AAC, WMA, Musepack have long left behind the outdated MP3 with its many limitations and disadvantages.

In parallel, lossless coding is gaining momentum. But due to the large amounts of data, today it is still unsuitable for full-scale use - especially for portable devices with a limited amount of memory, for streaming broadcasts on the network, and simply for the quick exchange of music on the Internet (I must admit that not everyone not always at hand there is a 100-megabit Internet access).

And so, the MP3 is out of date, and a replacement is definitely ripe for it. Just how to be an uninitiated user, but who wants to achieve the highest quality sound with minimal cost of memory? After all, there are quite a few alternative codecs (at least 3 of them are really worthy of attention): Apple is promoting the AAC format (Advanced Audio Coding - is positioned as the successor to MP3) using its iTunes Store, Microsoft is promoting its own licensed WMA (Windows Media Audio), in addition , OGG Vorbis is gaining more and more popularity, and especially enlightened ones even use such a format as Musepack. Which of these codecs to choose?

There is no definite answer to this question - and that is why I am writing this article.

How to decide?

The choice of one or another codec depends on the specific task. Namely:

1. From the hardware and software with which the sound will be reproduced. Those. from the availability of support for a particular audio format, as well as the quality of playback (it is advisable to be guided by it when choosing a bitrate).

2. The amount of memory that will be allocated for the final material. Accordingly, a higher or lower target bitrate / quality is selected.

Well, of course, it is necessary, in addition to the format and bitrate, to select the optimal encoder and encoding parameters. It should be understood that various formats / encoders manifest themselves differently on different bitrate ranges.

Thus, the algorithm is approximately the following:

1) Find out what formats the target device supports.
2) Decide how much space you can allocate for audio material, as well as determine the total duration of the audio intended for encoding.
3) Calculate the desired bitrate by the formula: bitrate = disk_space (in kilobits) / total_duration (in seconds).
4) In accordance with the bit rate, select the optimal one from the supported formats (more on this later).
5) Choose the best encoder and its parameters.

More about our heroes

Aac

The development of data compression methods and psychoacoustics gradually led to the fact that the MP3 standard became “cramped” for the implementation of new ideas in audio coding. As a result, by 1997, the Fraunhofer Institute (IIS), which created the MP3 in the early 90s, as well as Dolby, AT&T, Sony and Nokia, developed a new audio compression method - Advanced Audio Coding (AAC), which became standard MPEG-2 and MPEG-4. The main differences from the MP3 standard are:

support for a wider range of formats (up to 48 channels) and sound sampling frequencies (from 8 kHz to 96 kHz);
more efficient and simple filter bank: the hybrid MP3 filter bank has been replaced by the usual MDCT (modified discrete cosine transform);
the wider range of variation of the frequency-time resolution in the filter bank - eight times (in MP3 - three times) - led to improved coding of transients (transients) and stationary sections of the audio signal;
better coding of frequencies above 16 kHz;
more flexible stereo coding mode, allowing you to switch to M / S (“joint stereo”) mode independently in different frequency bands;
additional features of the standard that increase compression efficiency: technology for the formation of time-domain noise (TNS), prediction of MDCT coefficients in time (long term prediction), parametric coding mode for stereo signals (parametric stereo), noise synthesis (perceptual noise substitution), technology for recovering high frequencies (SBR).

Thanks to these features, the AAC standard is able to achieve more flexible and efficient, and therefore - better sound coding. As a result of the widespread adoption of the MP3 format, the AAC standard has not yet gained comparable popularity with the MP3. However, AAC is the main format in the popular iTunes Store, iPods, iTunes, iPhone phones, PlayStation 3, Nintendo Wii and DAB + / DRM digital broadcasts.

OGG Vorbis

Ogg Vorbis is a relatively new universal audio compression format officially released in the summer of 2002. It belongs to the same type of formats as MP3, AAC, VQF and WMA, that is, to lossy compression formats. The psychoacoustic model used in Ogg Vorbis is similar in principle to MP3 and their ilk by its operating principles, but only - the mathematical processing and practical implementation of this model are fundamentally different, which allows the authors to declare their format completely independent of all predecessors.
The main undeniable advantage of the Ogg Vorbis format is its complete openness and freedom. Moreover, it uses the latest and highest quality psychoacoustic model, which is why the bitrate / quality ratio is much lower than other formats. As a result, the sound quality is better, but the file size is smaller.
The format has a large number of advantages. For example, the Ogg Vorbis format does not limit the user to only two audio channels (stereo left and right). It supports up to 225 individual channels with a sampling frequency of up to 192kHz and a resolution of up to 32bit (which does not allow any lossy compression format), which is why Ogg Vorbis is great for encoding 6-channel DVD-Audio sound. In addition, the OGG Vorbis format is sample accurate. This ensures that the audio data before encoding and after decoding will not have offsets or additional / lost samples relative to each other. It is easy to appreciate when you encode non-stop music (when one track gradually enters another) - as a result, the integrity of the sound will be preserved.
You will not surprise anyone with the possibility of streaming broadcasting, but this format has it from the very foundations. This gives the format a rather useful side effect - you can store several compositions with your own tags in one file. When downloading such a file to the player, all songs should appear, as if they were downloaded from several different files.
Separately, it is worth mentioning a fairly flexible tag system. The tag title is easily expanded and allows you to include lyrics of any length and complexity (for example, lyrics), interspersed with images (for example, a photo of the album cover). Text tags are stored in UTF-8, which allows you to write at least in all languages at the same time and eliminates possible problems with encodings. This is much more convenient than various tricks like id3 tags.
Ogg Vorbis uses a variable bitrate by default, while the values of the latter are not limited to any hard values, and it can vary even by 1kbps. It is worth noting that the maximum bit rate is not strictly limited by the format, and at maximum encoding settings, it can vary from 400kbps to 700kbps. The sampling rate has the same flexibility - users are given any choice in the range from 2000Hz to 192000Hz.
Ogg Vorbis was developed by the Xiphophorus community to replace all paid, proprietary audio formats. Despite the fact that this is the youngest format of all MP3 competitors, Ogg Vorbis has full support on all known platforms (Windows, PocketPC, Symbian, DOS, Linux, MacOS, FreeBSD, BeOS, etc.), as well as a large number of hardware implementations . Today's popularity far exceeds all alternative solutions.
It is worth noting that Ogg Vorbis is just a small part of the Ogg Squish multimedia project, which also includes free encoders: Speex - for voice compression; FLAC - for lossless sound compression; Theora - for video compression.

Musepack

MusePack (mpp, mp +, mpc, MPEG +) is an unlicensed file format for storing audio information, distributed under the GNU General Public License.
The MPC encoding quality at high bitrates (160 Kbps and higher) is noticeably (if not significantly) higher than the quality provided by MP3.
The main advantages:

The format does not perform the second dct conversion, it does not actually suffer from pre-echo artifacts, unlike such formats as MP3, Vorbis, AAC and WMA.
More efficient variable bitrate algorithms. If you trace how the bitrate changes during the playback of MPC tracks, you will notice that for simpler sections, the encoder allocates a lower bitrate, and for complex sections, a significantly higher bitrate, sometimes higher than 400 (!) Kbit / s. One interesting fact is worth mentioning here: an MP3 encoder in VBR mode allocates 32 kbit / s for silence (at a sampling frequency of 44100 Hz), AAC and OGG Vorbis - 2 kbit / s, Musepack encodes silence with minimal cost, <1 kbit / s (for example, a minute of silence will take some 514 bytes). All this speaks of the extreme “thrift” of this encoder.
Powerful and flexible psychoacoustic model. Here we can mention, for example, a dynamic low-pass filter based on frames (in other encoders a fixed bandwidth is set for each quality preset).
More advanced compression based on optimized Huffman tables (the same LAME MP3 wastes about 20% of the bitrate - just because of imperfect mathematical compression)

Wma

Windows Media Audio is a licensed file format developed by Microsoft for the storage and broadcast of audio information.

Initially, the WMA format was advertised as an alternative to MP3, but today Microsoft opposes it with the AAC format. Nominally, the WMA format is characterized by good compression ability, which allows it to “bypass” the MP3 format and compete in parameters with the Ogg Vorbis and AAC formats. But as shown by independent tests, as well as subjective assessment, the quality of the formats is still not unambiguously equivalent, and the advantage even over MP3 is unambiguous, as Microsoft claims.

Selection of format, encoder and parameters

Now directly to the point.

To facilitate your choice, I would like to share my experience gained through numerous comparisons, listening, and also based on the analysis of the results of open auditory tests.

And so, below I will talk about the most suitable encoders for each individual case, as well as the correct choice of parameters. For conversion, I recommend using foobar2000 (the converter settings are described in detail here ), the parameters are specified just for it. In addition, foobar2000 has a large number of useful DSPs that can be useful to us for audio preprocessing.

For those who are going to convert through the console or another program: the% s variable must be replaced with the name of the source file (or a similar variable), and% d with the name of the output file.

Please note that for each bitrate range the possible format options are indicated: the first is the highest priority. If your player does not support the first option - pay attention to the following, etc. As I already wrote, in fact, only three codecs are worthy of attention today - these are AAC, OGG Vorbis and Musepack. WMA, however, because of its closeness, does not differ in special quality, but still in most cases is better than MP3. Given that some devices from the alternatives only support WMA, I will give recommendations for each of the four formats.

About bitrates: it must be understood that the so-called optimal coding mode True VBR i.e. mode with the target quality, not bit rate. Ideally, the result is a track with a variable bitrate, but constant quality (do not equate these two concepts - more complex fragments of the track need more bits to maintain quality). Thus, the bitrate at the output is difficult to predict. Therefore, the bitrate values below are shown only as approximate, if possible - average for a large number of compositions of varying complexity.

Mentioned in this article, as well as some other encoders, with Russian descriptions of the main parameters and recommendations can be found here .

Ultra low bitrates (~ 25-40 kbit / s)

This range is great for encoding audio books. And here there can be only one option - AAC, or rather, Nero AAC . The parameters are as follows:

-lc -q 0.35 -ignorelength -if - -of %d

In this case, the material must be pre-converted to mono and sampled to a frequency of 22050 Hz (preferably with a SoX resampler). At the output we get the usual Low Complexity AAC with a bitrate of about 25 kbps.

For music in this range, there are also options:

1) Nero AAC . There are no conversions needed:

-q 0.15 -ignorelength -if - -of %d

The output is High Efficiency AAC v2 (with parametric stereo and HF synthesis), ~ 35 kbps. A great option for some kind of Internet radio. Only here we must not forget that the decoder in the player must support HE-AACv2, otherwise you will get a complete lack of treble and monophony.

2)OGG Vorbis AoTuV - this modification of libvorbis includes an improvement in the coding algorithm with low bitrates and, even without SBR technology, is not much inferior to HE-AACv2. Command line:

-s %r -Q -q-2 - -o %d

Files thus obtained should be fully compatible with standard OGG Vorbis decoders. Bitrate - similar - about 35 kbps.

3) WMA 10 Pro . For such cases, Microsoft also has something like SBR (High Frequency Synthesis), which doesn’t sound as bad as it could. True bitrate is slightly beyond - 48 kbit / s.

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 48_44_2_16 -input %s -output %d

Note that old (especially hardware) decoders do not support WMA 10. For this case, you can use WMA 9.2 (the same encoder), however, its quality at low bitrates is much worse.

-silent -a_codec WMA9STD -a_mode 3 -a_setting 48_44_2 -input %s -output %d

Low bitrate, ~ 64 kbps

Initially, I thought to immediately go to higher speeds. But since quite recently at hydrogenaudio.org there was a comparison of encoders on this bitrate, it’s a sin to miss it.

1) QuickTime AAC - the winner (except for the newly made Opus / CELT) of the same test. Below are the settings for the QAAC encoder :

-s -v 64 --he -q 2 --ignorelength - -o %d

At the output, we have HE-AAC (with SBR, but without Parametric Stereo), which should be supported by various iPods and the like.

2) OGG Vorbis AoTuV - although it turned out to be quite far from QAAC, but still:

-s %r -Q -q0 - -o %d

3) And just in case, WMA 10 Pro :

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 64_44_2_16 -input %s -output %d

For older decoders - WMA 9 Standard:

-silent -a_codec WMA9STD -a_mode 3 -a_setting 64_44_2 -input %s -output %d

A little higher, ~ 80-100 kbps

And I consider this bitrate already because of Vorbis.

1) As tests have shown, the OGG Vorbis AoTuV encoder handles it best :

-s %r -Q -q1 - -o %d

2) Nero AAC is a very good result. In places where the highs are not so pronounced, it can sound even better than Vorbis (at high it loses due to synthesis).
30 -ignorelength -if - -of% d The

profile used is HE-AAC.

De facto standard, 128 kbps

An interesting fact: many argue that for MP3 128 kbps is the “border bit rate”, from which quality indistinguishable from the original begins. Perhaps this is so ... for plastic Chinese speakers with thieves. In reality, this threshold is somewhere around 200 kbit / s, and the new formats give a more stable quality on this bitrate.

Modern coders managed to underestimate this 128 kbps bar almost twice (again, according to the developers). But, nevertheless, if you have more or less decent acoustics (or headphones), you can catch the difference on complex fragments even at 128 kbps.

1) Nero AAC :

-q 0.40 -ignorelength -if - -of %d

Profile - regular AAC LC.

2) OGG Vorbis AoTuV :

-s %r -Q -q2.8 - -o %d

3) WMA 10 Pro :

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 128_44_2_24 -input %s -output %d

For older decoders - WMA 9 Standard:

-silent -a_codec WMA9STD -a_mode 3 -a_setting 128_44_2 -input %s -output %d

~ 160-192 kbps

In this range, the difference between the Nero, QuickTime AAC, and Vorbis encoders is almost null. But here the very Musepack is already on the scene. Just at these bitrates its advantage begins to appear (due to the unusually flexible VBR mode, as well as a fundamentally different compression algorithm):

1) Musepack--silent --quality 5 - %d

2) Nero AAC-q 0.50 -ignorelength -if - -of %d

3) OGG Vorbis AoTuV : -s %r -Q -q5 - -o %d

4) WMA 9 Standard :

-silent -a_codec WMA9STD -a_mode 3 -a_setting 160_44_2 -input %s -output %d

Transparency Threshold: ~ 200-225 kbps

What I was talking about. At the same time, bitrate almost all encoders give a sound transparent to most listeners. And it is this range that is optimal in terms of size / quality.

By the way, LAME MP3 also has a similar threshold in this area (VBR V2), but this codec has very big problems with pre-echo (distortions preceding sharp bursts of signal), and Noise Shaping is often heard by ear (noise from quantization errors in this way transferred to the high frequency region).

At the same codecs like Vorbis, AAC and MPC, on this threshold, a clear rendering of even background noise in the compositions begins.

1) Musepack--silent --quality 6 - %d

2) Nero AAC-q 0.55 -ignorelength -if - -of %d

3) OGG Vorbis AoTuV : -s %r -Q -q6 - -o %d

4) WMA 10 Pro :

-silent -a_codec WMA9PRO -a_mode 3 -a_setting 192_44_2_24 -input %s -output %d

WMA 9 Standard, the maximum bitrate perceived by older decoders:

-silent -a_codec WMA9STD -a_mode 3 -a_setting 192_44_2 -input %s -output %d

Reasonable Maximum: ~ 320-350 kbps

I must pay your attention: after ~ 225 kbit / s the increase in bitrate most often does not give an audible increase in quality, and the file size naturally increases. But still, for particularly complex compositions (and good equipment / ears), there are higher quality settings. On these bitrates for such encoders as Museppack and Vorbis I was not even able to find killer samples (problematic samples on which the flaws of the encoding algorithm are clearly manifested). And so:

1) OGG Vorbis AoTuV-s %r -Q -q9 - -o %d

2) Musepack--silent --quality 10 - %d

3) QAAC-s -V 127 -q 2 --ignorelength - -o %d

4) WMA 10 Pro-silent -a_codec WMA9PRO -a_mode 3 -a_setting 384_44_2_24 -input %s -output %d

Ahead of your questions: yes, for some of these encoders there are higher quality settings, but their further increase no longer makes any sense. Unless the amount of memory occupied by the music really matters to you, and your device does not have lossless support.

That, in fact, is all that I wanted to share with you. Try, comment, ask questions.

Tags: