Improving the performance of multimedia applications with hardware acceleration

Original author: Gael Hofemeier
  • Transfer

The architecture of Intel processors is becoming more and more focused on the GPU, which opens up amazing opportunities for a sharp increase in productivity simply by offloading multimedia processing from the CPU to the GPU. There are many tools available to developers to improve the performance of multimedia applications. Among these tools are free and easy to use.
In this publication you will find:
  • Overview of Computing Architectures and Current Intel GPU Capabilities
  • Implementing hardware acceleration with FFmpeg
  • Implement hardware acceleration using the Intel Media SDK or similar Intel Media Server Studio (depending on the target platform)

If you feel the need to improve multimedia processing performance but don’t know where to start, start with FFmpeg. Measure performance in software processing, then just turn on hardware acceleration and see how much performance has changed. Then add the use of the Intel Media SDK and compare again using different codecs and in different configurations.

Computing architecture: from superscalar to heterogeneous


To appreciate the importance of GPU development, let's start with the history of improving CPU architecture.
Back in the nineties. The first serious stage in development is the emergence of a superscalar architecture, in which high throughput was achieved due to parallel processing at the instruction level within the same processor.


Figure 1. Superscalar architecture

Then, in the early 2000s, a multi-core architecture appeared (when there can be more than one computational core in a single processor). Homogeneous cores (all completely identical) made it possible to execute several threads simultaneously (parallel processing at the thread level).
At the same time, the performance of multi-core architecture was limited due to a number of obstacles.
  • Memory: the gap between processor speed and memory speed increased.
  • Instruction-Level Parallel Processing (ILP): It has become increasingly difficult to locate instructions available for parallel processing within a single thread in order to fully consume the resources of a single high-performance core.
  • Power consumption: with a gradual increase in the processor clock speed, power consumption grew exponentially.



Figure 2. Multi-core architecture

Modern heterogeneous architecture


In a heterogeneous architecture, there may be several processors that use a common data pipeline, which can be optimized for individual functions of encoding, decoding, converting, scaling, using interlaced scanning, etc.

In other words, thanks to this architecture, we have received tangible advantages both in terms of productivity and in the field of energy consumption, previously inaccessible. In fig. Figure 3 shows the development of GPU over the last five generations: graphics processors are becoming increasingly important. And when using h.264, and when switching to the most modern h.265 codecs, GPUs provide significant processing power, due to which video processing with a resolution of 4K and even with a higher resolution is not only possible, but also performed quite quickly.


Figure 3. The development of heterogeneous architecture

GP performance generations


In fig. Figure 4 shows a sharp increase in computing power in just a few generations, in which the graphics processors were structurally located on the same chip with the CPU. If your application uses multimedia processing, you must use GPU unloading to achieve acceleration of 5 times or more (depending on the age and configuration of the system).


Figure 4. Improving graphics processing in each generation of Intel processors

Getting Started with GP Programming


Step 1 usually measures the performance of H.264 so that you can further evaluate the change in performance as the code is refined. FFmpeg is often used to measure performance and to compare speed when using hardware acceleration. FFmpeg is a very powerful yet fairly easy to use tool.

In step 2, testing is carried out with different codecs and in different configurations. You can enable hardware acceleration by simply replacing the codec (replace libx264 with h264_qsv) with Intel Quick Sync Video .

In step 3, the use of the Intel Media SDK is added.

Note. This publication discusses the use of these tools in the Windows * operating system. If you are interested in an implementation for Linux *, seeAccess Intel Media Server Studio for Linux codecs using FFmpeg .

▍Coding and decoding FFmpeg


Start with H.264 (AVC), since h264: libx264 is the default software implementation in FFmpeg and produces high quality software-only. Create your own test, then measure the performance again by changing the codec from libx264 to h264_qsv. Later we will talk about H.265 codecs.

It should be noted that when working with video streams, you have to choose between quality and speed. With faster processing, quality almost always decreases and file size increases. You will have to find your own acceptable level of quality based on the amount of time required for coding. There are 11 presets for choosing a particular combination of quality and speed - from "Fastest" to "Slowest". There are several data rate control algorithms:
  • coding in 1 pass with a constant data rate (set -b: v);
  • coding in 2 passes with a constant data rate;
  • constant speed coefficient (CRF).

Intel Quick Sync Video supports decoding and encoding using Intel CPUs and integrated GP1. Please note that the Intel processor must be compatible with Quick Sync Video and with OpenCL *. For more information, see the Intel SDK for OpenCL * Release Notes . Decoding and encoding support is built into FFmpeg using codecs with the suffix _qsv . Currently, Quick Sync Video is supported by the following codecs: MPEG2 video, VC1 (decoding only), H.264 and H.265.

If you want to experiment with Quick Sync Video in FFmpeg, add libmfx. The easiest way to install this library is to use the libmfx version packaged by lu_zero.
Quick Sync Video hardware accelerated encoding example: FFmpeg can also use hardware acceleration when decoding with the -hwaccel option . The h264_qsv codec is very fast, but it can be seen that even the slowest mode of operation with hardware acceleration is much faster than only software coding with the lowest quality and highest speed. When testing with H.265 codecs, you will need to either access the libx265-enabled build or build your own version according to the instructions in the Encoding Guide for FFmpeg and H.265 or in the X265 documentation . Example H.265:

ffmpeg -I INPUT -c:v h264_qsv -preset:v faster out.qsv.mp4







ffmpeg -I input -c:v libx265 - preset medium -x265-params crf=28 -c:a aac -strict experimental -b:a 128k output.mp4

For more information about using FFmpeg and Quick Sync Video, see Intel QuickSync Video and FFmpeg Cloud Computing .

Using Intel Media SDK (sample_multi_transcode)


To further improve performance when using FFmpeg, you must optimize the application using the Intel Media SDK. The Media SDK is a cross-platform API for developing and optimizing multimedia applications in such a way as to use Intel hardware acceleration with fixed functions.

To get started with the Intel Media SDK, just follow a few simple steps:
  1. Download the Intel Media SDK for the target device.
  2. Download the tutorials and read them to understand how to configure the software using the SDK.
  3. Install the Intel Media SDK. If you are using Linux, see the installation guide for Linux .
  4. Download the sample SDK code to experiment with already compiled sample applications.
  5. Build and run the Video Transcoding application: sample_multi_transcode

Commands are similar to FFmpeg commands. Examples: Note that to use hardware acceleration, you must specify the -hw option in the argument list. This example also works with the HEVC decoder and encoder (h.265), but it must be installed from the Intel Media Server Studio Pro release. There are many options that you can specify on the command line. Using the -u option

VideoTranscoding_folder\_bin\x64>\sample_multi_transcode.exe -hw -i::h264 in.mpeg2 -o::h264 out.h264
VideoTranscoding_folder\_bin\x64>\sample_multi_transcode.exe -hw -i::h265 in.mpeg2 -o::h265 out.h265



you can set the target use (TU), as when using FFmpeg presets. TU = 4 is used by default. In fig. 5 shows performance indicators at different TU settings.


Figure 5. Examples of H264 performance characteristics with respect to their intended use.

Use other Intel software.
To further refine the code, you can use Intel optimization and profiling tools, including Intel Graphics Performance Analyzer (GPA) and Intel VTune Amplifier . In addition, Intel Video Pro Analyzer and Intel Stress Bitstreams and Encoder tools help to achieve high quality video and streaming, improve the work of encoders and decoders, as well as speed up verification so that you can quickly release solutions to the market.

Conclusion


Computer architecture has undergone significant changes over the past 20 years, and its development only in the last five years has given a significant increase in productivity. Intel CPUs can now process multimedia directly on the GPU, making new usage models available to both end-users and companies.

You can measure performance improvements yourself using FFmpeg, and further optimize your code using the free Intel Media SDK APIs. The transition from software processing to hardware acceleration increases system performance and reduces power consumption (and costs), and also provides additional computing resources sufficient to switch over to the H.265 codec family over time.

Additional Resources


  1. Install and run the Intel Media SDK on Windows
  2. FFMPEG.ORG
  3. Integration of Intel Media SDK with FFMPEG for operations of multiplexing, demultiplexing, encoding and decoding sound
  4. Intel Media SDK Tutorials for Clients and Servers
  5. Intel Graphics Performance Analyzers
  6. Intel VTune Amplifier
  7. Intel Media Server Studio
  8. FFmpeg-based application acceleration with Intel Quick Sync Video
  9. Intel QuickSync Video and FFmpeg *
  10. Intel QuickSync Video and FFmpeg: Installation and Verification
  11. Access Intel Media Server Studio for Linux Codecs with FFmpeg
  12. HEVC Codec Value (H.265)

Also popular now: