Comparison of the codecs libtheora and x264

After Youtube and Vimeo presented their test pages in HTML5, there was another wave of talk about what is better: H.264 or Ogg Theora.

Of course, I am for a free web. But the conclusions that Theora is superior to H.264 in quality, made by many people based on two dubious comparisons ( once and twice ) are very hasty.
The bear is ill

In the first comparison, neither a test video nor any codec settings are presented at all. The second says that a deliberately sloppy preset from Youtube was taken for the H.264 codec, and Theora’s settings are silent.

So I decided to check for myself what Ogg Theora is and what this codec is capable of.

Format Comparison

The first thing I noticed was a list of Ogg Theora features . For comparison, the list of features of H.264

What confused:

The minimum block size is 8x8 (in H.264, the minimum is 4x4, which allows you to better save small details)
The absence of arithmetic coding (which allows you to play 15 percent for free)
Half-pixel accuracy of motion compensation (quarter-pixel in H.264)
No b frames

This is only if you compare the description of the capabilities of different formats.
It is worth noting that the H.264 feature list is much longer. So it seemed to me surprising that the knowingly less advanced codec wins comparisons.

Codec Versions

Theora Binaries are not represented on the site, but I couldn’t get it from the source = (Found ffmpeg2theora builds . Version Theory 1.1.0 (libtheora 1.1 20090822 (Thusnelda)), although version 1.1.1 is on xiph.org . But for the latest version only minor fixes are declared, so I think it's okay. So, see ffmpeg2theora 0.25 in the blue corner of the ring .

For comparison, I decided to take the x264 codec . A fairly advanced representative of the H.264 codec family with a lot of settings and good community support. Open source, moreover, based on the latest comparison I took the second place from MSU Videogroup, losing a little to the leader. So, in the red corner of the ringx264 r1400 .

For decoding, I used the plugin for AviSynth FFmpegSource2 version 2.12.

Comparison Technique

For comparison, I took four video sequences with a resolution of 640 pixels in width. Encoded in two passes (it is much easier to get in size) with a bitrate of 500 kbps. The Theora settings were set to maximum quality and the most flexible rate control. For x264, I took two presets: the first is similar to Theora's capabilities (half-pixel shifts, no b-frames, 8x8 block size, etc.), the second is a regular x264 preset with all the features turned on. Quality was measured with PSNR and SSIM metrics using the MSU Video Quality Measurement Tool .
I did not evaluate the encoding time, since aligning the results also in time is a big problem. And most likely x264 would get a noticeable advantage in speed due to assembler optimizations, since this is a more mature project.

Presets

Theora:
--soft-target --two-pass --optimize --speedlevel 0 --keyint 250

x264 analogue:
--bframes 0 --no-cabac --partitions i8x8,p8x8 --me umh --no-mbtree --no-psy --no-fast-pskip --no-dct-decimate --subme 1

x264 normal:

--bframes 4 --b-pyramid normal --partitions all --me umh --no-psy --trellis 2 --no-fast-pskip --no-dct-decimate --subme 10 --b-adapt 2 --direct auto

In the list of Teora's features, the use of several reference (reference) frames is stated, but this feature is not made in the settings. And since I can’t control the use of several reference frames by Teora, I allowed x264 not to deny anything to myself and use the default ref = 3.

Sequences

Battle
A small piece from the second Terminator, where something constantly happens, shoots, explodes. Very dynamic video.
Football
A small piece of football broadcast. Typical use-case, by the way.
Shuttle start Shuttle
launch, as the name implies. Static video.
Toys and calendar
Video with smooth movement and a lot of small details.

results

For starters, the PSNR and SSIM metrics. In general, SSIM appeared later and is considered more advanced. Also, as far as I know, comparison results using SSIM are usually closer to the results of subjective comparison. But just in case, PSNR also measured it.

SSIM Comparison

As you can see, Theora hopelessly merges the usual x264 preset. Regarding the x264 preset with the truncated Teore settings, you can also count the defeat. No miracle happened.

Now a little walk through the sequences.

Battle

Here, Teore also played along with the PSNR, and in general the lag is small. I note that in my perception of the result, the bitrate was not enough even for the usual x264 preset - a video that is too dynamic.
Screenshots of a pair of frames for comparison.

Source, battle, frame 389

An example where Theora is superior to x264
Source, battle, frame 444

Football

Theora's football field turned out to be some kind of uneven. The x264 result is much nicer to watch.
Source, football, frame 361

Shuttle start

Both codecs did equally well with this video. But on average, x264 pulled a little bit in detail.
Source, shuttle_start, frame 379

Toys and calendar

Here, Theora has a complete failure. The truncated x264 preset neatly covers high frequencies, and overall the picture is watchable. At Theora, in places, terrible blockiness and in places demolition of parts. And the usual x264 preset had enough bitrate, even the patterns on the wallpaper remained.
Source, toys_and_calendar, frame 77

x264 analogue, toys_and_calendar, frame 77

x264 normal, toys_and_calendar, frame 77

Check Bitrate Deviation

Pure formality to make sure that codecs fall into the bit rate specified in the settings. Deviation of up to 5% is considered normal, everything is in order here.
Bitrate Deviation Chart

At the end of the article there are links to coding results. Some types of artifacts are very noticeable in the pictures, but not very noticeable in the video. And some are the other way around. So if anyone is interested, you can appreciate it.

Aspect ratio with the same quality

Still, it is difficult to assess the practical benefits of the fact that one video has more parrots than another. So I also decided to check what size the x264 video can fit in with the same visual quality as Theora. Theora preset is the same, for x264 - the one that is x264 normal. For comparison, I used SSIM. For all SSIM sequences, x264 is slightly higher, but as close as possible to how I managed it.
Here is the bitrate chart of the resulting files:
Size comparison graph for the same SSIM

Files vary in size by 2-4 times.

How to improve x264 results

The tuning capabilities of x264 are just a lot. What I can not say about Ogg Theora. So, if you do not publish codec settings, you can get different results.
How could you want to play along with x264:

Increase the number of reference frames
Increase the number of consecutive b-frames
Increase maximum radius of motion estimation
Use the options --tune ssim and --tune psnr, which can improve performance on one metric, slightly worsening them on another (in the first comparison from Theora developers there was only PSNR)
Enable and configure psycho-visual optimizations if the comparison were subjective
Tackle deblocking options
Use other trickier settings that use the features of the test video

How to degrade x264 results

In the x264 analogue preset, a lot of good and useful things are disabled. But if I really wanted Teora to win in my comparison, what else could I do:

Disable the use of multiple reference frames
Leave default psycho-visual optimizations that drain both SSIM and PSNR
Choose a lower quality motion estimation algorithm
Leave the ability to use blocks only 16x16 pixels in size
Misuse adaptive quantization options to squander one of the metrics
Use any inappropriate settings

To summarize

H.264 is more effective than Ogg Theora format in terms of quality / size. It is also much more flexible, it allows you to significantly vary the parameters depending on the task.

And do not blindly believe comparisons that do not include codec settings. Too many room to maneuver.

Video Links

Encoding results (25 MB)
Original video (365 MB) - it weighs so much because it is encoded by the lossless codec huffyuv.

Tags: