Own video platform or how to take a lot of computing resources. Part 1

_{In the photo - the first four-wheel flying bike. A source.}

Today, thanks to the availability of the necessary services, placing video on the network is not a difficult task. However, there are not so many materials on the internal design of such systems, especially in the Russian-speaking segment.

I have been engaged in designing and developing a high-quality video platform for some time. In this article I want to describe those moments that I would like to know at the beginning of development.

The article does not pretend to the status of leadership, in it I will try to describe only interesting or non-obvious points affecting the processing and delivery of video content based on HTML5.
The material is designed for those who are already in the subject, or is ready to look for decoding abbreviations, terms and concepts.

The second part of.

Format

H264 High-profile, despite its popularity, it turns out, does not work everywhere - some browsers do not include its support. Fortunately, on modern devices, almost everywhere where H264 is not supported, VP8 / 9 work. VP9 is preferable to use, because old versions of decoders that can VP8, but not VP9 or H264, I have not met. VP9 provides comparable to the H264 picture quality at bitrate ~ 30% lower - this is important to reduce the load on the channels. Additionally, if the use of MPEG codec can potentially have legal claims (a very complicated story), then VP9 is fine with that. True, the coding speed of VP9 is lower by about an order of magnitude, so more resources should be allocated for its processing.

If there is a need to support old equipment that does not cope with H264 High, then you can add 480p H264 Main as a third format with a lower bit rate.

It is better not to use Hi10P in large quantities due to weak hardware decoding support.
H265 is clearly going to demand license fees, which is not for everyone.

Soft vs hard

Hardware encoders do not use most of the advanced features of the codecs (the saving of space on the chips affects), producing non-optimally scrambled files. The choice of format is limited, not everything is possible to adjust the encoding parameters - often, of the actual values affecting the result, there is only the bitrate, and even that is perceived in a very peculiar way. If everything is done as it should, then on normal chips you can get quite sane result with a linear (impact in dynamic scenes) and a bit too high bitrate.

And, of course, for a hardware encoder to work, you need a device — a video card or a processor with a video core, which are not available on all servers.

But they are fast. Highly. Compared to software processing, the speed can grow a couple of hundred times, to a level where there may not be enough disk IO.

Processing by hardware is very much dependent on the solution provider - each vendor has its own set of libraries and utilities for this, and there is something to choose from: Intel Quick Sync, NVenc, AMD VCE.

With software processing, there are no such restrictions, and with an equivalent bitrate, the result is better. To work with different formats and codecs, there is ffmpeg; the “apparatchik” has no such luxury (with reservations).

Video quality criteria

To determine the target quality, the easiest way to count in bits per pixel is BPP. This parameter is independent of resolution, frame rate and duration. From him already count the bitrate by the formula

BPP * Framerate * Width * Height

Optimal BPP values are best chosen by independent experiments under the video that you plan to process. A good starting value for H264 is around 0.09 bit / pixel. For high-performance codecs like H265 and VP9, this parameter can be reduced in proportion to the comparative compression ratio. Also, BPP can be slightly reduced for high resolution video, because The efficiency of codecs slightly increases with resolution, however, for this amendment, the resolution of the coding section (slices, a feature of codecs that allows coding video with semi-independent fractional resolution blocks) must be taken into account.

For the resulting bit rate formula, it is desirable to predetermine the maximum values, based on the expected Internet speed of the client - very few people will be comfortable to watch, though very high-quality, but constantly buffered video.

That is why it is inconvenient to use Q-parameters of codecs (quality parrots) - fixed values give an unpredictable final bit rate.

maxRate is best done with a margin, because codecs may not accurately maintain the required values, even with two-pass encoding.

To preserve the quality of dynamic scenes, it is better to enable VBR mode for codecs, however minRate is better to set at least 90% of the final bitrate so that rate peaks do not lead to buffer underrun.

In quality control, utilities like Intel VPA, ffprobe and Python are useful. Using the latter, it is convenient to make comparisons of the source code and the converted video, count arbitrary metrics, such as the average pixel deviation.

The PSNR and SSIM calculation in practice is extremely inefficient due to the psycho-visual optimizations included by default in codecs. If there is a desire to calculate these metrics more or less adequately, you can disable optimization through

-tune [psnr|ssim]

However, the final file, of course, will be different from what was done without these flags.

Preview

The main problem of generating preview images is a fuzzy source. Definition and search for clear images is a very nontrivial and resource-intensive task. Fortunately, the solution to this problem in most codecs is included in the video encoding process. You can take the key frame closest to a certain position, of all the frames surrounding it, it will be the clearest. In ffmpeg, you can do it like this:

-ss [позиция] -vf \"select='eq(pict_type,PICT_TYPE_I)'\" -vsync vfr

Standard encoders do not compress in the best way, so after receiving the image it is better to press it with something like optipng - an average saving of 500kB on the FHD preview.

High-resolution images are best interlaced. Thus, we will slightly increase the size (by 5-10%), but seriously reduce the display time on the loading page.

The article has already turned out to be dense, and I have doubts that all the information should be packaged in one huge text. If the continuation on this topic is interesting, write in the comments or tick in the poll.

The platform is closed, but you can look at its work here .

_{* I have no relation to the authors of the relevant sites and can not share their views and opinions. Decisions about who and how to access the code, I can not comment.}

Ready to answer questions.

Only registered users can participate in the survey. Sign in , please.

Subject for possible next article

Tags: