Server-based video encoding solution using Intel HD Graphics embedded video


    In a previous article, we talked about video encoding using Intel Quick Sync technology on modern Intel processors and the experience we gained in integrating this technology into our software. This time I’ll talk about how we created the server solution, the problems we encountered, and the performance of our solution on Intel server processors. I take this opportunity to thank our colleagues from Intel for their prompt assistance in the process of integrating Intel Quick Sync into our software.

    Testing

    To test our software, a 1U server was selected in the following configuration:
    M / B Supermicro X10SLH-F
    CPU Intel® Xeon® CPU E3-1225 v3 @ 3.20GHz
    Memory 16 GB
    OS version on Ubuntu server 12.04.4 LTS 3.8.0-23-generic. The main condition for Quick Sync to work is the presence of the C226 line in the chipset specification. Only chips with this marking can work with hardware video encoding. In addition, it is desirable that there is no integrated video on the motherboard, otherwise there may be problems with the definition, and, therefore, the use of Intel GPUs using the Intel Media SDK.
    The motherboard described above has integrated graphics (integrated video) on board, and we had to tinker in order to make the SDK work on this hardware. When installing the SDK on a new server, the Media SDK installation script did not see the device ID. At the same time, we were not able to enable the integrated graphics in the processor from the BIOS. The search for a solution led to the need to update the BIOS. After that, the coveted item appeared in the BIOS. However, I had to disable the video built-in on the motherboard by switching the jumper. In this configuration, IPMI and monitor output do not work, but we work with the server through SSH and this is not so critical.
    In addition, there are some restrictions on the Linux kernel used in the system. For servers, this is Ubuntu 12.04 LTS with kernels 3.2.0-41 and 3.8.0-23 or SUSE Linux Enterprise Server 11 with kernel SP3 3.0.76-11.

    We also optimized the raw frame transfer mechanism in our pipeline using the native SDK memory type, which increased productivity and made it possible to squeeze the maximum speed out of iron. In this case, only a pointer to the surface is transmitted and there is no physical copying of memory along the pipeline.

    As a test video, the video was 1920x800, H264, lasting 12 minutes. Output video: 1920x800, high, H264, 8Mb / s. In the case of ffmpeg, the options were by default (profile high). The test utility from the Intel Media SDK sample_full_transcode also encoded with default settings (profile high). Streambuilder with QuickSync support encoded with the following parameters: profile high, RateControlMethod cbr, level avc 4.2. The target usage parameter (affects the quality / coding rate) is balanced in all cases.
    The test results are illustrated in the following table.

    Processor: E3-1225 V3, 16 GB RAM, Intel® HD Graphics P4600
    ffmpegsample_full_transcodestreambuilder (no optimization)streambuilder (optimization)
    time8 minutes 42 s1 minute. 19 s2 min. 19 s1 minute. 40 s
    cpu (max)750%55%125%fifty%
    mem (max)3.3%4.6%0.5%0.4%
    PSNR48,10746.68
    Average PSNR51,20449.52
    SSIM0,999340,9956
    MSE1,6232,969

    Processor: I7-3770, 3 GB RAM, Intel® HD Graphics 4000
    ffmpegsample_full_transcodestreambuilder (no optimization)streambuilder (optimization)
    time8 minutes 48 s1 minute. 24 s2 minutes. 31 s1 minute. 23 s
    cpu (max)750%19%150%45%
    mem (max)18%20%2.8%2.3%
    PSNR48,10746,495
    Average PSNR51,20449.27
    SSIM0,999340,991
    MSE1,6233,036

    16GB E3-1285 v3 Processor, Intel® HD Graphics P4700
    ffmpegsample_full_transcodestreambuilder (no optimization)streambuilder (optimization)
    time8 minutes 1 s1 minute. 11 s2 minutes. 11 s1 minute. 34 s
    cpu (max)750%55%130%55%
    mem (max)3.3%4.6%0.5%0.4%
    PSNR48,10746.68
    Average PSNR51,20449.52
    SSIM0,999340,9956
    MSE1,6232,969

    Results Analysis

    The metrics for streambuilder correspond to the received metrics for the test utility sample_full_transcode and I omitted them.
    These tables show that server processors with Intel® HD Graphics P4700 / P4600 in this experiment are faster and provide better encoding quality than I7-3770, Intel® HD Graphics 4000. However, this thesis is not always true, as Intel improves the quality encoding with each new version of the chip and SDK and the speed on new chips may be less. At the same time, the load on the CPUs of the former is slightly larger. What it is connected with is not yet clear.
    In addition, the optimization of working with memory gave an increase of about 2 times in terms of performance.

    The coding quality on the Intel® HD Graphics P4700 is the same as on the Intel® HD Graphics P4600, but the E3-1285 v3 is faster by about 14% with the same resource load. In addition, E3-1285 v3 is faster than E3-1225 V3 in encoding with ffmpeg by about 10%.
    A server with streambuilder installed with Quick Sync support allows you to encode one source in 12 qualities of Full HD (1080p), 24 qualities of HD (720p) and 46 qualities of SD (480p) with cutting in HLS. If this is a raw signal with SDI, then the number of simultaneously encoded qualities is slightly larger.
    You can experiment with streambuilder (so far only the libavcodec based version) by downloading it from here . A standard config comes with it, allowing you to write any source to the HLS format.

    Summary

    Intel Quick Sync technology allows you to build a relatively low-cost productive server for encoding video with acceptable quality. In the process of introducing this technology, we encountered some technical problems associated with the presence of video integrated into the motherboard, which, however, is completely solvable. (Recall that the main thing when choosing hardware for these purposes is a chip with the C226 specification and a motherboard without integrated video, since IPMI and VGA output may not work with it).
    The advantages of this solution, in my opinion, is that the CPU is almost not involved, as well as low memory consumption. At the same time, free resources can be used for other tasks or for encoding by means of the CPU.

    In the near future, we will play with the VPP (video post processing = video processing) functions of the Intel Media SDK (denoise, crop, resize, frame rate conversion, deinterlacing, etc.). So far we have implemented crop, resize and deinterlacing, and these operations are performed as quickly as their purely software counterparts. The Intel Media SDK has a lot of encoding parameters, and we continue to do tests and compare with our profiles. On the results of experiments with VPP, performance / quality and comparing 2-pass ffmpeg / h264 encoding with LookAhead Intel HD Graphics technology, I think we will write more.

    Also popular now: