How modern image codecs squeeze sound. JPEG2000 vs MP3
In this experiment, the popular JPEG2000 image compression format will be used for an unusual task, storing an audio file.
In general, sound and image are very similar. If we represent sound in a wave form, then we get a change in the sound signal over time. Similarly, if we take one row of image pixels, we get a change in brightness over distance.
The greater the amplitude of the oscillations of the sound signal over time, the louder the sound. An analogue for the image will be an increase in contrast.
The faster the sound signal changes, the more high frequencies there will be in the sound. Similarly, a quick change in brightness in a row of pixels indicates a large number of details in images.
Moreover, the sound signal, that the brightness of the pixels in the row change smoothly enough so that the codec can use this property.
One small problem remains. Sound is a one-dimensional signal, and the image is two-dimensional. One can imagine that a sound file is one long row of pixels, and an image is a lot of rows of pixels. However, the adjacent rows of piskels are very similar.
There is an analogue for a sound wave - the fundamental frequency. And to it in addition there is a bunch of harmonics that fit exactly into the length of the main wave. If you cut the sound signal along the length of the main wave and put it together, then the neighboring pieces will be similar to each other.
For the experiment, a half-minute sound file was prepared from my favorite song Ame Caleen - A demi-nue. Recording in 16 bit mono format takes 2570 KB.
The fundamental frequency was experimentally determined for this file. And then, as described above, the record is cut into pieces equal to the length of the period of this wave. The result is an image file. The pixel representation format is 16 bits in grayscale. That is, it fully corresponds to the sound sample format. Image size 909x1448 pixels.

It’s very convenient that JPEG2000 supports 16-bit / pixel grayscale. For compression in JPEG2000, ImageMagick was used. ImageMagick allows you to compress the image weakly or strongly, thereby affecting the quality of the resulting sound recording. ImageMagick rival selected a regular mp3 codec from the Adobe Audition package.
The essence of the experiment was to select a codec to get a jp2 file of the same size as mp3, and compare the quality of the resulting sound files.
I wanted to evaluate how badly the quality will suffer with medium and strong compression. By selecting codec parameters, the source file was compressed up to 32KB for strong compression and up to 400KB for medium.
With medium compression, the JPEG2000 adds a distinctly audible noise signal to the sound. Otherwise, the sound is very similar to the original. With strong compression, JPEG2000 has a lot of distortion, clicks, the sound is dull, the bottoms and tops are disgusting. But interestingly, unlike MP3 in similar conditions, through all the distortions, the singer's voice is heard much better.
For strong compression of JPEG2000, an additional image transformation was performed (to extend the sound quality): reducing the image size. Reducing the image width resembles decreasing the sampling rate for sound. And reducing the height of the image is something like speeding up the sound.
The lossy compression of JPEG2000 was also tested (i.e., compression with almost no distortion). The jp2 file has shrunk to 71% percent compared to an undead image. Not bad, while specialized lossless codecs (like FLAC and APE) are at a level of 40-50 %%.
Another result. JPEG XR lossless compression showed 81%.
The following are the ImageMagick launch commands.
400KB compression example:
convert -depth 16 -size 909x1448 wav.txt.gray -depth 16 -type Grayscale -define jp2: rate = 0.1565 tn.jp2
convert tn.jp2 -type Grayscale tn3.gray
For 32K compression:
convert - depth 16 -size 909x1448 wav.txt.gray -depth 16 -type Grayscale -resize -454x924 -define jp2: rate = 0.0325 tn.jp2
convert tn.jp2 -type Grayscale -resize -909x1448 tn3.gray
Below is a link to the file with the results. It contains audio files received by JPEG2000 and mp3 codecs, and an example of a “picture with sound”.
http://depositfiles.com/files/jmd4yfdf5 .
In general, sound and image are very similar. If we represent sound in a wave form, then we get a change in the sound signal over time. Similarly, if we take one row of image pixels, we get a change in brightness over distance.
The greater the amplitude of the oscillations of the sound signal over time, the louder the sound. An analogue for the image will be an increase in contrast.
The faster the sound signal changes, the more high frequencies there will be in the sound. Similarly, a quick change in brightness in a row of pixels indicates a large number of details in images.
Moreover, the sound signal, that the brightness of the pixels in the row change smoothly enough so that the codec can use this property.
One small problem remains. Sound is a one-dimensional signal, and the image is two-dimensional. One can imagine that a sound file is one long row of pixels, and an image is a lot of rows of pixels. However, the adjacent rows of piskels are very similar.
There is an analogue for a sound wave - the fundamental frequency. And to it in addition there is a bunch of harmonics that fit exactly into the length of the main wave. If you cut the sound signal along the length of the main wave and put it together, then the neighboring pieces will be similar to each other.
For the experiment, a half-minute sound file was prepared from my favorite song Ame Caleen - A demi-nue. Recording in 16 bit mono format takes 2570 KB.
The fundamental frequency was experimentally determined for this file. And then, as described above, the record is cut into pieces equal to the length of the period of this wave. The result is an image file. The pixel representation format is 16 bits in grayscale. That is, it fully corresponds to the sound sample format. Image size 909x1448 pixels.

It’s very convenient that JPEG2000 supports 16-bit / pixel grayscale. For compression in JPEG2000, ImageMagick was used. ImageMagick allows you to compress the image weakly or strongly, thereby affecting the quality of the resulting sound recording. ImageMagick rival selected a regular mp3 codec from the Adobe Audition package.
The essence of the experiment was to select a codec to get a jp2 file of the same size as mp3, and compare the quality of the resulting sound files.
I wanted to evaluate how badly the quality will suffer with medium and strong compression. By selecting codec parameters, the source file was compressed up to 32KB for strong compression and up to 400KB for medium.
With medium compression, the JPEG2000 adds a distinctly audible noise signal to the sound. Otherwise, the sound is very similar to the original. With strong compression, JPEG2000 has a lot of distortion, clicks, the sound is dull, the bottoms and tops are disgusting. But interestingly, unlike MP3 in similar conditions, through all the distortions, the singer's voice is heard much better.
For strong compression of JPEG2000, an additional image transformation was performed (to extend the sound quality): reducing the image size. Reducing the image width resembles decreasing the sampling rate for sound. And reducing the height of the image is something like speeding up the sound.
The lossy compression of JPEG2000 was also tested (i.e., compression with almost no distortion). The jp2 file has shrunk to 71% percent compared to an undead image. Not bad, while specialized lossless codecs (like FLAC and APE) are at a level of 40-50 %%.
Another result. JPEG XR lossless compression showed 81%.
The following are the ImageMagick launch commands.
400KB compression example:
convert -depth 16 -size 909x1448 wav.txt.gray -depth 16 -type Grayscale -define jp2: rate = 0.1565 tn.jp2
convert tn.jp2 -type Grayscale tn3.gray
For 32K compression:
convert - depth 16 -size 909x1448 wav.txt.gray -depth 16 -type Grayscale -resize -454x924 -define jp2: rate = 0.0325 tn.jp2
convert tn.jp2 -type Grayscale -resize -909x1448 tn3.gray
Below is a link to the file with the results. It contains audio files received by JPEG2000 and mp3 codecs, and an example of a “picture with sound”.
http://depositfiles.com/files/jmd4yfdf5 .