
Video processing on CPU and GPU. Expert Answers

In this post, we publish the answers of Intel expert Dmitry Serkin to the questions you previously asked about processing video on a CPU and GPU. We apologize for being late - it is associated with a large time difference between us and Dmitry.
As usual, for the convenience of the search, the questions are provided with the author’s Habra name.
Question Maratyszcza
Will there be hardware blocks in Intel processors for other (not video) compression algorithms, for example deflate?I don’t think so. There is optimization for specific processors. Intel Integrated Performance Primitives, contains optimization of ZLIB, DEFLATE, and GZIP family of functions at the level of algorithms and instructions.
Question lifestar
What codecs does the hardware compression of the CPU support?If we are only talking about coding, then H.264, MPEG-2, MJPEG, and MVC for stereoscopic 3D support. On the approach are several more widely known. JDima
Question
Can QuickSync be expected to compare with x264 in the quality of the resulting image?If we talk about presets (encoding settings) for quality, we will never catch up. With each new platform, the coding quality improves, since there is a greater resource on the hardware side and, as a result, the ability to improve algorithms, for example, motion estimation and packing of bitstream. x264 uses very good algorithms (not fast, but affecting quality), including RDO. All this falls extremely bad on the conveyor architecture in the iron. If we talk about the average presets, then it’s quite hit. Everything, of course, rests on the final settings of the codec, of which there are many. You need to understand that quality and speed do not go hand in hand. The goal of QuickSync is to encode quickly with good quality for 99% of users. And technology does it. In the meantime, work on increasing dB is going on every day.
Questionweatherman
Does the HD 4000 and the new HD 5000 differ greatly in performance? Can you give some examples with modern games?According to recent press releases, the speed has increased up to 3 times, energy consumption has decreased by 2 times. I have not seen public game benchmarks. They should appear a few weeks before Haswell's launch. As far as I remember, it will be held in June. Unfortunately, I can’t give examples, since I'm not in this topic, I deal with codecs.
questions tp7
1. Are there any plans to support hardware decoding of multi-bit video, for example Hi10P from H264 or “older” HEVC profiles?I do not have such information. Plans are a changeable thing. If these profiles are massively used, then with a very high probability they will be supported.
2. I remember that some time ago there were attempts at dialogue with the developers of free codecs on the subject of what they would like from the new Intel processors. What is the situation in this direction now? Do open source developers affect Intel and does Intel provide any support for them?More likely at the level of applications, but not developers. The recent announcement that HandBrake supports QuickSync is one such event. This is Intel's contribution to the free product. Such activities will occur more and more often, since the development of QuickSync on Linux and its derivatives (Android) is in full swing.
As for giving direct access to the driver and hardware, I have not heard of such activities. In addition, I consider them pointless, since this work is rather nontrivial. In addition, there is a Media SDK , it provides higher-level primitives.
3. At the moment, in principle, there are no good implementations of coding on the GPU (there are only a few, and all do not differ in quality or a special advantage in speed). Why is this happening and are there any positive developments in this area?I find QuickSync a very successful solution, which has both speed and good (relative to this speed) quality. As for solutions from AMD or Nvidia, their failure can be explained by a different architecture from Intel. All of their decisions are based on execution units and multi-threading, which is difficult to use in codecs (some cornerstone algorithms do not fall for multi-threading). QuickSync is a combination of the EU and the fixed function (the algorithmic blocks are “soldered” to the hardware). This combination allows you to get an excellent increase in productivity and quality.
4. It's no secret that the performance of the recently released HEVC and VP9 is now beyond reason. What is your assessment of how soon a processor / software will be able to process (at least decode) HD video of these formats in real time?I believe that in a couple of years such an opportunity will appear.
5. How widely does handwriting asm are used in Intel multimedia products, or do you rely more on compiler optimization? Are you using C ++, or just good old C? How much time does it take to optimize performance compared to directly implementing new functionality?In war, all means are good :) We use all of the above at the driver level and below. A specific ASM, of course, is generated from a C-like code for subsequent manual optimization. It takes a lot of time for everything. There is a lot of research both in the field of quality and productivity, but there is a deadline for everything. I will not say the exact proportion, but research, of course, consumes more time.
6. How big is the multimedia team at Intel? How hard is it to get to you? :)From hardware, drivers to various SDKs, there are thousands of people. Looking at what position you are marking;) In Russia (Moscow and Nizhny Novgorod) there is a large team that is engaged in the Intel Media SDK. They periodically appear vacancies.
Question RussianNeuroMancer
Is the problem in hardware or in the driver?Here most likely in the driver. On Windows, this is a known issue with some OS level restrictions. But it is solvable. More accessible and detailed I wrote here .
Question Ilya_Smelykh
Will there be hardware colorspace conversion for most popular formats? What about hardware deinterlacing?All this is . Planar and packaged formats. More will come next. Deinterlacing is also supported. Aingis
Question
As you know, last fall, Apple released the 13-inch MacBook Pro with retina. There is no discrete graphics card in it and all the graphics work on Intel HD4000. There are reviews that this platform is simply not enough for full support. What does Intel plan not to concede at least iPad with retina in terms of graphics?I think that the graphics are developing quite quickly and powerfully. Intel Iris should dot every i.
Question diger
Tell us an example of video encoding on a GPU at home.The most common example is coding for mobile devices. If you want to transcode a series of a series into a format supported by a mobile device in a few minutes, and not wait half an hour, then QuickSync will help you.
Question Russelll
Will there be 64 bit drivers for intel 3650?I apologize, but I do not have such information. But the topic is hot judging by the forums.
Questions sancho2222
1. Is there something similar to KUDA in Intel processors?Mean Nvidia CUDA? The answer is Intel OpenCL .
2. What libraries are needed to use the graphics capabilities of the Intel processor, in particular: h.264 encoding / decoding?All you need is an Intel Media SDK.
3. Is there enough Intel i7-3517UE processor performance to simultaneously decode and encode video resolution of 960 * 720 in H.264?Yes, of course. And even in multiple threads.
4. I have a problem with the Intel Atom (tm) N2800 processor. Maybe you can help me. I am decoding with ffmpeg H.264 from a Logitech C920 camera, video resolution 960 * 720. After decoding, I get the YUYJ420 frame format. With this resolution, I can decode 2 streams at 24 frames per second with the above resolution, but if I flip the video after decoding by 270 degrees, then I rest against the cache limitations (as I understand it), and in the end I can use only 20 frames per second and one stream, if you increase the number of frames, the video falls apart into small squares and terribly slows down. Please tell me what could be the problem? exactly this cache?Most likely you run into the overall system performance. All operations take place on a central processor and with two threads plus postprocessing it can no longer cope. To play back delays, ffmpeg starts skipping frames, so you are seeing artifacts. What CPU usage is there?
I did not quite understand what format the output was. YUV420? Depending on the format, you need a different set of operations to rotate. Well, there is not much cache, but it, as you know, affects the speed.
Question yurasek
I’m interested in what is the potential of the logic built into the 2nd and 3rd generation Intel Core processors with h.264 hardware decoding? That is, how many, for example, real-time h.264 streams with a resolution of 1280 x 720 (1920 x 1080) / 25 frames per second can be processed by the Intel i7-3770 processor using hardware decoding (if the program code is ideally as optimized as possible) for subsequent display on the screen? How much will the resources of other processor units be involved?Good question. The number of threads physically rests only in the graphic memory. As long as there is enough memory to select surfaces, everything should work. Another issue is performance. Depends on the content you are about to decode. In other words, depending on how the streams were encoded, it takes a different amount of time and resources. Taking into account all these factors (and many others), my rough estimate from my head is up to 20 real time sessions at the same time.