Codec 2 + neural network = whole podcast on one floppy
- Transfer
In the previous article, we discussed the Opus codec, which runs at very low bitrates. But another codec tends to achieve even lower bit rates - this is Codec 2 .
Codec 2 is designed to encode speech only. Although the bitrate is impressive, the sound is not as good as in the case of Opus, which can be heard in audio samples . However, in combination with a neural network ( WaveNet ), the codec shows impressive results .
WaveNet Neural Layers
Codec 2 is distributed open source and is intended for speech coding. It focuses on the bit rate from 700 to 3200 bps.
The developer is David Rowe , an electronics engineer currently living in South Australia. He began the project in September 2009 with the aim of improving low-cost radio communications for people in remote areas of the world. To this end, he was going to develop a codec that would significantly reduce the file size and bandwidth requirements for streaming.
Another motivation, according to David, was the creation of patent-freecodec as an alternative to proprietary codecs, which, in his opinion, “require expensive and awkward licenses and stifle innovation”. He believes that it is possible to do without the patented codecs, so he distributes all the work under a free license.
The author cites various codec applications, among them VoIP, voice communication over a narrow band of digital HF / UHF radio (especially for amateur radio, to avoid problems with the use of proprietary codecs), communications in developing countries and remote regions, including the army, police and rescue services .
We at Auphonic are interested in the potential use of a codec for better compression of podcasts, presentations, and audio books, which allows us to reduce the amount of space taken up and minimize the effect of bad network connections .
To reduce the bitrate, it is necessary to reduce speech to the minimum possible information / data, that is, to minimize the amount of redundantly transmitted information.
For this, Codec 2 uses harmonic sinusoidal speech coding . It divides speech into segments of 10−30 ms, which are called frames. Each frame is then analyzed for the fundamental level (pitch) and the number of harmonics that fit into the 4 kHz bandwidth. Then for each harmonic in the 4 kHz range, the amplitude and phase are recorded.
This information is then encoded, and the decoder recovers the sound based on this data. Codec 2 block diagrams: encoder (left) and decoder (right). Illustration of Rowtel
Although it all sounds great in theory, but what about reality? Let's listen. Here is a short wav sound file:
intro-orig.wav - 1.3 MB
Apply Codec 2 (without WaveNet decoder) on various available bitrates: 3200 bps , 2400 bps , 1600 bps , 1200 bps and 700 bits / s .
These examples show a significant reduction in file size.
Let's look at the files from the point of view of their volume for storing 1 hour of sound :
The compression is very strong, but the result clearly sounds unnatural.
For comparison, the same sound in MP3 at 8 kbps .
The file size is significantly larger than that of Codec 2, and the quality is probably still unacceptable. You can hear well what is sometimes called sizzle - strange metallic sounds inherent in low quality MP3s.
There is the latest codec to compare with. It seems that it unites both worlds, that is, it provides an acceptable quality at a low bitrate: Opus .
Thanks to its convincing performance at low bit rates, Auphonic already offers users Opus encoding up to 6 Kbps, the lowest bit rate that the codec supports.
At 6 Kbps, the Opus codec seems to be much better than the 8 Kbps MP3. The voice is a bit muffled, but still sounds natural .
Returning to Codec 2 purely for the sake of interest, let us hear how he gets to encode music ! (Keep in mind that Codec 2 is not intended for encoding music, but only for speech).
The original
MP3 file 8 Kbps.
I personally can not listen to MP3 on such a bitrate, so let's look at the results of Codec 2! So, 3200 bps , 2400 bps , 1600 bps , 1200 bps , 700 bps .
It is easy to understand that for this purpose it does not fit!
As we have heard, despite the impressive compression, the result is not very natural sound.
But here it becomes more interesting if you look at the work of Bastian Klein from the Library of Cornell University. He used Codec 2 at 2400 bps for encoding, but he replaced the Codec 2 decoder with WaveNet's generative deep learning model (for more information, see “Coding low-bit speech based on Wavenet” ).
Here are some examples from the authors :
Male voice Codec 2
source file With WaveNet decoder Female voice Codec 2 source file With WaveNet decoder
Compared to Codec 2, we hear a significant improvement in quality , and if we compare it with the original, there is no significant reduction in quality.
David Row himself said that he considers the result to be a “dramatic improvement in speech coding at low bit rates” and “a good wideband speech codec of 8000 bps.”
Although the (original) Codec 2 codec is a very interesting work, its scope is limited and the end result is not suitable for podcasting. Also, by audio examples, it is clear that it can be used to compress only the voice, but not the music.
Nevertheless, Codec 2 in combination with the WaveNet decoder significantly improves the quality, and the low bit rate (2400 bps) will be extremely interesting for distributing podcasts and audiobooks : only 1.03 MB of space is required for one hour of sound ! Auphonic will add Codec 2 support to output files when the WaveNet decoder appears in an easy-to-use form. So far, we have added support for Codec 2 for input files only.
.
Codec 2 is designed to encode speech only. Although the bitrate is impressive, the sound is not as good as in the case of Opus, which can be heard in audio samples . However, in combination with a neural network ( WaveNet ), the codec shows impressive results .
WaveNet Neural Layers
Introduction
Codec 2 is distributed open source and is intended for speech coding. It focuses on the bit rate from 700 to 3200 bps.
The developer is David Rowe , an electronics engineer currently living in South Australia. He began the project in September 2009 with the aim of improving low-cost radio communications for people in remote areas of the world. To this end, he was going to develop a codec that would significantly reduce the file size and bandwidth requirements for streaming.
Another motivation, according to David, was the creation of patent-freecodec as an alternative to proprietary codecs, which, in his opinion, “require expensive and awkward licenses and stifle innovation”. He believes that it is possible to do without the patented codecs, so he distributes all the work under a free license.
Potential application
The author cites various codec applications, among them VoIP, voice communication over a narrow band of digital HF / UHF radio (especially for amateur radio, to avoid problems with the use of proprietary codecs), communications in developing countries and remote regions, including the army, police and rescue services .
We at Auphonic are interested in the potential use of a codec for better compression of podcasts, presentations, and audio books, which allows us to reduce the amount of space taken up and minimize the effect of bad network connections .
How it works
To reduce the bitrate, it is necessary to reduce speech to the minimum possible information / data, that is, to minimize the amount of redundantly transmitted information.
For this, Codec 2 uses harmonic sinusoidal speech coding . It divides speech into segments of 10−30 ms, which are called frames. Each frame is then analyzed for the fundamental level (pitch) and the number of harmonics that fit into the 4 kHz bandwidth. Then for each harmonic in the 4 kHz range, the amplitude and phase are recorded.
This information is then encoded, and the decoder recovers the sound based on this data. Codec 2 block diagrams: encoder (left) and decoder (right). Illustration of Rowtel
Audio examples and comparison with other codecs
Although it all sounds great in theory, but what about reality? Let's listen. Here is a short wav sound file:
intro-orig.wav - 1.3 MB
Apply Codec 2 (without WaveNet decoder) on various available bitrates: 3200 bps , 2400 bps , 1600 bps , 1200 bps and 700 bits / s .
These examples show a significant reduction in file size.
Let's look at the files from the point of view of their volume for storing 1 hour of sound :
- At 3200 bps, one hour of sound requires only 1.37 MB (fit on one old 3½-inch floppy disk!)
- 2400 bit / s bit rate corresponds to 1.03 MB / h
- 1600 bit / s equals 0.68 MB / h (or about two hours of sound on a single diskette! )
- 1200 bps - up to 0.51 MB / h
- 700 bps - up to 0.3 MB / h
The compression is very strong, but the result clearly sounds unnatural.
For comparison, the same sound in MP3 at 8 kbps .
The file size is significantly larger than that of Codec 2, and the quality is probably still unacceptable. You can hear well what is sometimes called sizzle - strange metallic sounds inherent in low quality MP3s.
There is the latest codec to compare with. It seems that it unites both worlds, that is, it provides an acceptable quality at a low bitrate: Opus .
Thanks to its convincing performance at low bit rates, Auphonic already offers users Opus encoding up to 6 Kbps, the lowest bit rate that the codec supports.
At 6 Kbps, the Opus codec seems to be much better than the 8 Kbps MP3. The voice is a bit muffled, but still sounds natural .
Returning to Codec 2 purely for the sake of interest, let us hear how he gets to encode music ! (Keep in mind that Codec 2 is not intended for encoding music, but only for speech).
The original
MP3 file 8 Kbps.
I personally can not listen to MP3 on such a bitrate, so let's look at the results of Codec 2! So, 3200 bps , 2400 bps , 1600 bps , 1200 bps , 700 bps .
It is easy to understand that for this purpose it does not fit!
Codec 2 and WaveNet
As we have heard, despite the impressive compression, the result is not very natural sound.
But here it becomes more interesting if you look at the work of Bastian Klein from the Library of Cornell University. He used Codec 2 at 2400 bps for encoding, but he replaced the Codec 2 decoder with WaveNet's generative deep learning model (for more information, see “Coding low-bit speech based on Wavenet” ).
Here are some examples from the authors :
Male voice Codec 2
source file With WaveNet decoder Female voice Codec 2 source file With WaveNet decoder
Compared to Codec 2, we hear a significant improvement in quality , and if we compare it with the original, there is no significant reduction in quality.
David Row himself said that he considers the result to be a “dramatic improvement in speech coding at low bit rates” and “a good wideband speech codec of 8000 bps.”
Conclusion
Although the (original) Codec 2 codec is a very interesting work, its scope is limited and the end result is not suitable for podcasting. Also, by audio examples, it is clear that it can be used to compress only the voice, but not the music.
Nevertheless, Codec 2 in combination with the WaveNet decoder significantly improves the quality, and the low bit rate (2400 bps) will be extremely interesting for distributing podcasts and audiobooks : only 1.03 MB of space is required for one hour of sound ! Auphonic will add Codec 2 support to output files when the WaveNet decoder appears in an easy-to-use form. So far, we have added support for Codec 2 for input files only.
.