3Dvideo May 28, 2019 at 08:58

Street magic codec comparison. Reveal Secrets

This year marks the anniversary - 16 years since the compression.ru website was launched, where the author and his associates organize comparisons of video codecs and image encoders. During this time, dozens of comparisons were made with reports from 23 to 550+ pages , the number of graphs in the last comparison exceeded 7000, and the number of different enchanting cases during this time finally exceeded all reasonable limits. Since the next round date (32 years) will come soon, there is a desire to tell a little bit of enchanting in honor of the anniversary.

If we talk about codecs, it is no secret that most of the comparisons and graphs that the most respected public sees are products of the marketing department. In the best case, engineers competently did the graphics, and marketing only gave the go-ahead for publication. In the worst case, the engineers did not participate in their preparation at all. ~~Why waste time on these busy people!~~

At the same time, the topic of compression is very popular. In the Silicon Valley series, the protagonist’s startup developed an ingenious algorithm that showed incredible 3D video compression in the last episode of the first season.and as a result, now millions of start-ups (and investors) in the world know that the main thing is that the Weissman coefficient should be bigger and more genius needs to be found, and the rest is bullshit. The miracle will be! This naturally increases the expectation of miracles and, of course (of course!) These miracles are joyfully demonstrated by companies! Including using the latest achievements of street magic.

DISCLAIMER: Any coincidence of the names of the companies below with real names is absolutely random.

Sit back! We promise that by the end of the story you will be able to show such tricks yourself, as, however, and to reveal many of them. Go!

Level 1, tricks for beginners

Let's start with the simplest, because, oddly enough, these methods are rolled in a modern (not serial, but real!) Silicon Valley.

So, the most respected public, tricks with a demonstration of super-strong compression begin!

Surely, many have seen similar dynamic comparisons ~~with inconspicuous cats~~ based on JS on the pages. If compression is compared, then it is reasonable that the quality is as equal as possible (ideally exactly the same), and on the right it would be compressed 2 times better, for example.

No sooner said than done!

The company claims 30% better compression (all matches are random!). And the pictures look exactly the same! Even a professionally trained look does not find differences. There is a desire to look in more detail. We climb into the page code and see that the slider for the first and second pictures takes data from one file! We get advantages in several ways at once: firstly, the best result is ideally demonstrated, secondly, the engineer was not distracted from work, and finally, this place on the site page loads twice as fast. Solid profit !!!

The case, do not believe it, is real. Now you know where to look!

In another place - also a slider and again a wonderful result. We look into the slider code - different files are loaded. Taught by bitter experience, download them - they are not just up to a byte of the same size, they match bit by bit! In general, all the advantages of the previous method, but the focus is a little more complicated, though due to the slowdown in page loading (you have to pay for everything ...). And, most importantly, you do not need to attract expensive specialists in compression.

However, more advanced marketers at this level go even further. The slider is laid out - you look - the pictures are different, but the quality is very similar. Well, OK. Further more! Fantastic openness is demonstrated - there are even links to video files. Downloading is a very good advantage of their method, it is not even clear how they did it. It helps out that here we are all completely Russian hackers (already a brand in the west). We look at the bitstream and see a wonderful picture:

That is, even many experts, if they do not dig deep and double-check, will confirm that indeed people have an excellent result that is approximately twice the size of today's leader with comparable quality. You will not believe such methods are used to themselves and even in some cases allow you to get tens of millions of dollars of investment.

I remember a meeting with a Russian startup 6 years ago. Their director said right from the door: “You must do our best to us. We have investors from Severstal, and, if anything, sports shaved guys with soldering irons will come to you. ”As you know, in such harsh conditions the quality of research work magically increases, and the number of tricks of different levels decreases ... working with such cases in our homeland of elephants, there is an irresistible feeling of pity for Western investors. True, not all of our investors are so purely specific, and their magicians are also in our Palestinians. And regularly. But about it another time ...

Level 7 Resonance

This story is not about a video codec, but about image compression, but there was a lot in it according to all the laws of the genre of “honest tricks”.

Somehow, a fairly well-known company M decided that they need to add Windows Media Photo (WMP) to their Windows Media Video (WMV) and Windows Media Audio (WMA) formats. Clean for the kit, as you understand.

Young man at the gallery! Well, do not shout so loudly, it didn’t dawn on you alone! Cultural people (look at the first row) maximum - knowingly grinned in a mustache ...

It is said - done!

Next, carefully monitor the hands:

i.e. WMP has more details than JPEG and JPEG 2000 at the same compression level (JPEG and JPEG 2000 are gently equalized and the level is set 24 times), and in the following paragraph :

Those. usually only 6 times squeeze, and it was 24. Wow, it smells three times! In general, we are better 2 times for sure. The media carried the good news to the masses (some wrote that it was 2 times better than JPEG 2000), even on Habré they repeated this news .

A little later, a chart appeared from this presentation:

How to interpret such charts?

Vertical is usually quality (some of the metrics depending on the mode at this point in time), horizontal - one way or another - size. Usually, with increasing size, quality increases (although in practice, anything happens). On the line of the same quality (red horizontal), it can be estimated that the “purple” codec loses about 2 times the size of “blue” with the same quality in this range of bitrates.

The advantage over JPEG 2000 was small, despite the fact that they obviously selected the best picture with a wonderful boy and dolphins. We looked forward to playing with this encoder. After about six months, the utility for compression was laid out.

By that time, we had just a year ago a comparison of 9 implementations of JPEG 2000

Yes, yes, yes! As not all yogurts are equally useful, not all implementations of the standard are equally good. The standard specifies only a bit stream, you can put data into it (and, by the way, take it out!) In very different ways, this creates a separate market for codecswith its fierce competition for a good dozen parameters. A simple people, as a rule, does not know this, which allows him to ride on his ears with impunity practically on a bulldozer (“Our DVR supports the latest H.265 / HEVC, no one else has it!”). And no one (no one!) Is VERY probable of a setup.

We happily inserted 3 lines for WMP in the previous report. It turned out somehow like this:

It can be seen that the lines of JPEG 2000 implementations are pretty crowded and bold blue (the best WMP implementation) has results somewhere in between, i.e. JPEG 2000 PLAYS. If you take JASPER as zero and show everything vertically relative to it, then you can see that WMP with the worst parameter loses to almost everything except the last two (one of them is KDU, remember this), and with the best one it’s somewhere centered, losing to many implementations:

Since the comparison was posted publicly and caused a stir in narrow circles, the developer even answered it on the official blog . The note was polite: it was praised, criticized, and then, if you wade through the text, the man frankly admitted that they used the worst implementation of JPEG 2000 of our comparison (published six months before) in their comparison, though "completely by accident." Of course, we will believe them. A respected company and all that.

Further, the name of the technology was changed from WMP to HD Photo, however, the following verdict remained on the network :

As a cherry on a cake. Our colleagues went further: took more pictures and showed that HD Photo plays not only JPEG 2000, but also a good JPEG implementation ( in 7 cases out of 14) And loses specifically. There is reason to believe that they picked up the pictures, but they frankly buried HDPhoto, because who needs a format that in half the time plays the ancient JPEG - it is not clear:

Total secrets of this trick:

We take the worst implementation of the main competitor, compare with it.
We create an advertising hype (in the style of "we have overtaken everyone").
When the hype fades into the background, we release and hope that no one will verify what was really there.

Children! Never do this and do not deceive others! Your company may lose millions of dollars and the trust of specialists.

Level 10, fresh! With neural networks!

In general, there are a lot of such cases. Even in Russia, I encounter similar situations about a couple of times a year (information flows to us, as to the owners of compression.ru). In the west bred ~~suckers~~ investors about a month. And now, China has also connected to this entertainment. The power of computers is growing, the complexity and capabilities of algorithms - too. Understanding this is becoming more difficult. As a result, violent fun continues!

Recently, neural networks have become very popular. Absolutely everything that they touch magically improves. But do not apply them to video compression?

No sooner said than done!

Last November, another good news from the Wall Street Journal itselfflew around the world. Created a video codec based on machine learning that tore everyone up! Here is the proof:

In general, I personally am extremely skeptical about all the news with the mention of neural networks. And I advise you (ESPECIALLY if you are an investor). Neural networks are arranged in such a way that correctly selecting a training sample for a test one can show any (for the dull - ANY!) Desired result. Neural networks are an ideal tool for launching a stream of marketing wonders. One is more wonderful than the other!

In general, there is a schedule, there are pictures. Agree - convincingly. Especially for the skeptics, the gentlemen brought some more graphs on well-known test sets:

However, if the previous graph with pictures personally was somehow explainable to me (it’s always possible to sharpen one video and even with deep neural networks), then these two graphs made me alert .

Does nothing bother you in them?

Answer

It follows from them that for ten years from the adoption of the H.264 standard to the adoption of H.265, no special development of codecs took place! These stupid researchers stomped around for 10 years and made slower codecs that compress the same !!! The difference is 20% maximum, or even less !!! 8-\

They even bring the base under it, such as the classic codecs rested on the limit and are not particularly developing (and here they go onto the stage, all in white). And you know, such a blatant lie works great! And okay, “The Wall Street Journal” - they (I would like to believe) only understand finances, okay “MIT Technology Review” - these gentlemen take the word for gentlemen of Silicon Valley, but such a respected resource as Habr took the news uncritically - I'll make it! What can we say about massively reprinting the news ...

In reality, the picture of the development of codecs, fortunately, is noticeably different. Firstly, in the chart below, we built on the samexiph video set, you can see that H.265 is 25–31% better than H.264. Those. 10 years of codec development still were not in vain! ( Uff, it eased right from the heart ... ) Secondly, the fresh AV1 shows an almost twofold improvement compared to H.264, and the step of its advantage, frankly, is very noticeable:

Accordingly, one can see by eye that if you overlay the AV1 chart 45% to the left of H.264 on the authors' chart, it will cover the new codec as ... [cut out by censorship]. Covers well, in short. Therefore, they “forgot” to compare with him. The real alignment looks something like this (much less crowded, agree):

To make it clear, the codecs have standard presets that allow you to vary the speed over a wide range (often tens of times), but at the same time achieve greater compression with the same quality (often more than 2 times). At x265 (a very good open-source implementation of the HEVC standard ) they are called: ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, placebo. If we take medium for 1, then in speed and file size with the same quality, they can be located for a particular file, for example, as in the graph below. We can say that relatively medium, you can make the file 40% larger or smaller, varying the speed by 10 times:

Note that for some videos, the standard options do not necessarily go monotonously (in this case, in quality). Also, sometimes “non-standard” options can give a big gain in size , in particular, using the example above, having lost 20% in speed compared to medium, you can play 30% in size - almost like when switching to the next level standard, but with the previous low complexity decoder. But this is already a more complex level, about it another time.

As you can easily see above, gentlemen have taken for comparison "slower". It’s good that it’s not “veryfast”, because you could have it! ) And it doesn’t matter that their codec itself is enchantingly slow. For the most part, people looking at a graph do not remember that the speed of a codec can be several orders of magnitudevary depending on the parameters. Therefore, this technique completely rolls. Although on our chart above (“Bitrate / quality ...”) their bundle of lines was in the red area (which is the worst). At the same time justifying stomping on the spot in the development of codecs. Yea Yea!

There are more subtle manipulations, for example, gentlemen write: “To remove B-frames, we use H.264 / 5 with the bframes = 0 option, VP9 with -auto-alt-ref 0 -lag-in-frames 0, and use the HM encoder lowdelay P main.cfg profile. ” That is, they could not beat the usual codecs in a fair competition and chose the low-latency low-latency mode, which is usually used for real time, for example, for video conferencing. The results of the codec in it are worse, of course. At the same time, their decoder (silent about the encoder) works 2 seconds per frame, that is, you can’t even talk about any low-latency. But a few percent played.

These are not all the tricks that were used by gentlemen startups, but the picture is already clear.

It is clear that in order for the focus to look believable, additional touches are needed that give realism. For example, these gentlemen published an article at https://arxiv.org/abs/1811.06981. Today, the development of algorithms is so fast that it becomes unbearable to wait until an article is published in a journal, which is why many strong authors publish the results first on arxiv.org. For street magicians, this site is convenient in that you can place absolutely any material there - unlike peer-reviewed magazines and conferences, no one will ask unpleasant questions and cut off the publication (there are no kill reviewing serious places). But the general public does not know about the fact that, for example, on April 1, it was customary to publish various parodies of scientific articles on arxiv.org, including making fun of it as a place of publication, so for the general public the publication there seems to be even solid.

Move on. The article on habr about them was called "The first video codec on the machine learning radically exceeded all existing codecs, including H.265 and vp9 » . Another joke is that machine learning in compression is not only actively explored, which individual conference tracks are already dedicated to (that is, there are a lot of articles ), but it is also actively used, for example, in AV1 (I specifically provide a Google request). But, if they honestly said: “We released the second codec using machine learning, while losing the first in speed and compression,” the Wall Street Journal could not write about them ... And MIT TechReview would not write ... And even Habr ... Obviously, not having endured the latter, in the company a littlecorrected the pitch. At the same time, a feature of the modern Internet is that people do not check information, which allows many to proclaim themselves the first to many (starting with famous companies). Insolence, as you know, takes the city! A fact check is not fashionable.

- Googled!
- Is that how it is?
[example request given above)))]

And also about ML / DL. In the distant past, when floppy disks were large, and hard drives were small, one of the methods of “street magic for archivers” was to save a part of the compressed file somewhere far away to a directory with temporary files and thus show a record. Since then, times have changed. Winchesters have grown, floppy disks have completely disappeared, and it has become fashionable to hide data into the depths of several hundred megabytes of grid coefficients. You can save the "copyright mark" in the grid, you can save the easter egg, or you can set a fake compression record. Deep neural networks - definitely power, in short!

Summarizing this path to success:

We ignore the modern leader as if he did not exist at all.
We carefully formulate everything so that it reads as if we were the first to use some new technology (and even if the leader did the first, no one will check).
For standards 5– and 15 years ago, we turn off the handles so that they work worse than us.
More arrogance - we justify the fact that they lay heaped behind us so that they rested on the limit and are no longer developing.
We are published in The Wall Street Journal and on Habr ...

And ... (drum roll!) ... they give you a few more million dollars! Or they don’t give ... I wouldn’t give ... Investors! Do not sleep! And then again you will blow into the water ...

And now the master class!

As I promised above, by the end of this text you can easily shine on the stage of the conditional pikabu.

Now, O most respected public, I will show you a trick that allows you to compare anything with anything to any predetermined result. Those. if you want codec A to be better than codec B, then we’ll show it, if you want B to be better than A, well, we can show it. We will fulfill any whim of the marketing department ~~for your money~~ for free!

Let's check how these codecs press. As they say - do not trust anyone, check it yourself. And then, maybe the truth is, these standards do not develop and we are simply fooled, forcing in vain to pay the money earned by overwork;

Take the “Avatar” in 480p24 format and compress it with the x264 codec with the settings “-preset superfast -x264-params“ nal-hrd = cbr ”-b: v 1M -minrate 1M -maxrate 1M -bufsize 2M and the xvid codec with settings "-preset superfast -b: v 1M -minrate 1M -maxrate 1M -bufsize 2M" (two very good open-source implementations of the H.264 and MPEG-4 standards). Why these codecs and settings are taken will be explained later.

We got two files of almost the same size:
avatar_x264_cbr1M_superfast.mkv - 1402 MB
avatar_xvid_cbr1M_superfast.mkv - 1401 MB

And now, ladies and gentlemen! Watch your hands carefully !!!

We look, here is the new standard and the old:

We look at another frame:

No comment! But what if fast movement?

Agree! Progress is clear and inexorable! Everything is developing and getting better and better! And life is more beautiful and more fun!

Although ...

God, what is this ??? The new standard has completely merged ... Ahhh

! People! Know! Corporations are deceiving you !!! Codecs have not been developing for a long time, but they tell you that everything is going well!

You see? All they learned in 10 years is to erode blockiness! And they do it just disgusting! It got worse than it was! You have been led by the nose all these years !!!!!!! 11

And now we will figure out how to do it.

In fact, the full frame looks like this:

When the codec is working, especially in constant bitrate mode, the frame quality fluctuates quite a lot. Here is the beginning of the film, for example - quality by the classical metricPSNR (there is no doubt who is better, who is worse, by the way, it’s clear that green xvid loses on average): The picture is clicked. If you subtract one chart from another (in the figure below, another place in the file), you can see that on the whole the codec is older the standard will lose, but in some places it can go away by +5 dB (PSNR is convenient because it is inversely logarithmically proportional to the standard deviation, due to which, as a rule, the rule works: a difference of 1.5 dB is visible in the range of medium and low bitrates by eye) . And then you can see the frame, where the difference is 20 dB in the other direction: The picture is clicked

Now you understand why your humble servant always with sincere affection looks at the individual shots given in the marketing materials of companies as proof of higher quality in the video (especially when there are no schedules) ... And after all, they still do this sometimes!

To make it easier to select frames, more than 10 years ago we made a comparison mode in our tool MSU VQMT , in which 3 files were compared at once - the original, codec-1 and codec-2 and immediately saved, for example, the 30 best pairs of frames in one or in the other side. The main thing is to take the file more authentic!

A low-bitrate MPEG-4 was taken to make blocking more visible.

Total, the path to success:

We select a mode in which the quality variation in codecs is maximum (usually a single-pass CBR).
Reduce the resolution of the source by 2 times (because most likely you will have to increase the fragments, for example, the fragments were increased 3 times higher)
We take some metrics ( PSNR , SSIM , fashionable VMAF this season ).
We take as a comparison the old standard with blocking or disable internal deblocking from a competitor with options.
And finally, do not forget to take the file more authentic: 3 hours of the film - it’s the most!

And BINGO! You have some examples of how much better you are than your competitor!

Well, or somewhere where the audience is not too picky, you can successfully compare someone with someone. People will be pleased.

Now you know what questions to ask when you see a comparison with personnel in company materials! Maybe even less, finally, they will meet ...

Instead of a conclusion

The relatively simple ways of preparing marketing materials “in one's favor” in comparisons of codecs and encoders were discussed above. Naturally, in real life everything is more complicated. Alas, if you go deeper, it will not be so exciting and noticeably more complicated (those who wish can read the article and comments here , for example).

And people are usually interested in simple answers. The most popular answer in Ответы@Mail.ru the question "What is the best video codec?» - «the K-Lite Codec Pack the Mega». And this is really the shortest, most understandable and accurate answer for the mass audience. And you say codecs, standards ...

But the more people there, at least at the average level who are versed in the subject, the less cheeky the marketing departments or the impudent start-ups will blow the ears of investors to the ears. And life will be a little better.

Thank you, ladies and gentlemen! Everyone - technical literacy!

Acknowledgments

I would like to cordially thank:

Laboratory of Computer Graphics VMK Moscow State University MV Lomonosov for his contribution to the development of computer graphics in Russia and not only
our colleagues from the video group, including Sergey Zvezdakov, Anastasia Antsiferova and Roman Kazantsev, whose examples are used above,
personally Konstantin Kozhemyakov, who did a lot to make this article better and more visual,
and finally, many thanks to Sergey Lavrushkin, Yegor Sklyarov, Ivan Molodetsky, Evgeny Lyapustin, Dmitry Kulikov, Alexandra Anzina, Vitaly Lyudvichenko, Mikhail Erofeev and Georgy Osipov for a lot of useful comments and corrections that made this text much better!

Tags: