OpenCL What is it and why is it needed? (if there is a CUDA)

Hello, dear Habra community.

Many, probably, heard or read on a hub about OpenCL - the new standard for development of applications for heterogeneous systems. That's right, this is not a standard for developing applications for the GPU, as many people think, OpenCL was originally conceived as something more: a single standard for writing applications that must be run on a system with various architecture processors, accelerators, and expansion cards.

OpenCL Prerequisites

The main place where you can find heterogeneous systems are high-performance computing: from modeling physical processes in the boundary layer to video coding and rendering of three-dimensional scenes. Previously, similar tasks were solved using supercomputers or very powerful desktop systems. With the advent of NVidia CUDA / AMD Stream technology, it has become relatively easy to write programs that use the computing power of the GPU.

It is worth noting that similar programs were created earlier, but it was NVidia-CUDA that made GPGPU more popular by facilitating the process of creating GPGPU applications. The first GPGPU applications as cores (kernel in CUDA and OpenCL) used shaders, and the data was packed into textures. So you had to be familiar with OpenGL or DirectX. A little later, the Brook language appeared, which simplified the life of the programmer a bit (AMD Stream was created on the basis of this language (it uses Brook +)).

CUDA began to gain momentum, and meanwhile (or rather a little earlier) in the forge, located deep underground, at the foot of Mount Fuji, Japanese engineers forged a Cell ~~power processor~~ (it was born in collaboration with IBM, Sony and Toshiba). Currently, Cell is used in all supercomputers supplied by IBM, based on it are built the most productive supercomputers in the world (according to top500). Less than a year ago, Toshiba announced the release of its SpursEngine PC expansion card to accelerate video decoding and other demanding operations using Computing Units (SPEs) designed for Cell. There is an article on Wikipedia briefly describing SpursEngine and its differences from Cell.
Around the same time (about a year ago), S3 Graphics (actually VIA) also revived, presenting its new S3 Graphics Chrome 500 graphics adapter to the public. According to the company itself, this adapter also knows how to speed up all kinds of calculations. Complete with it comes a software product (graphics editor), which uses all the charms of such acceleration. Description of the technology on the manufacturer’s website .

So, what do we have: the machine on which the calculations are carried out may contain x86, x86-64, Itanium, SpursEngine (Cell), NVidia GPU, AMD GPU, VIA (S3 Graphics) GPUs. For each of these types of processes there is its own SDK (well, except maybe VIA), its own programming language and programming model. That is, if you want your rendering engine or the Boeing 787 wing load calculation program to work on a simple workstation, a BlueGene supercomputer, or a computer equipped with two NVidia Tesla accelerators, you will need to rewrite a fairly large part of the program, since each platform its architecture has a set of hard limits.
Since programmers are lazy people, and do not want to write the same thing for 5 different platforms, taking into account all the features, and learn to use different software tools and models, and customers are greedy people and do not want to pay for the program for each platform as a separate product and to pay for training courses for programmers, it was decided to create a single standard for programs running in a heterogeneous environment. This means that the program, generally speaking, must be able to run on a computer in which both the NVidia and AMD GPUs, Toshiba SpursEngine, etc. are installed.

Solution to the problem

To develop an open standard, we decided to attract people who already have experience (very successful) in developing such a standard: the Khronos Group, on whose conscience there are already OpenGL and OpenML, and much more. OpenCL is a trademark of Apple Inc., as stated on the Khronos Group website: “OpenCL is a trademark of Apple Inc., and is used under license by Khronos. The OpenCL logo and guidelines for its usage in association with Conformant products can be found here:
http://developer.apple.com/softwarelicensing/agreements/opencl.html . " In development (and financing, of course), besides Apple, such IT bigwigs as AMD, IBM, Activision Blizzard, Intel, NVidia, etc. took part. (full list here ).
NVidia did not particularly publicize its participation in the project, and rapidly increased the functionality and performance of CUDA. Meanwhile, several leading NVidia engineers participated in the creation of OpenCL. Probably, it was NVidia’s participation that to a large extent determined the syntactic and ideological similarities of OpenCL and CUDA. However, programmers only benefited from this - it will be easier to switch from CUDA to OpenCL if necessary.

The first version of the standard was published at the end of 2008 and has since undergone several revisions.

Almost immediately after the standard was published, NVidia announced that OpenCL support would not be difficult for it and would soon be implemented as part of the GPU Computing SDK over the CUDA Driver API. Nothing like that was heard from the main competitor of NVidia - AMD.
The driver for OpenCL was released by NVidia and passed the test for compatibility with the standard, but is still available only for a limited number of people - registered developers (anyone can submit an application for registration, in my case it took 2 weeks to consider, after which an invitation came by mail) . Restrictions on access to the SDK and drivers make us think that at the moment there are some problems or errors that cannot be fixed so far, that is, the product is still in beta testing.
Implementing OpenCL for NVidia was quite an easy task, since the main ideas are similar: both CUDA and OpenCL are some extensions of the C language, with similar syntax, using the same programming model as the main one: Data Parallel (SIMD), OpenCL also supports Task Parallel programming model - a model when various kernel can be executed simultaneously (work-group contains one element). The similarity of the two technologies is even indicated by the fact that NVidia released a special document on how to write for CUDA in such a way that later it is easy to switch to OpenCL.

What is the current situation?

NVidia’s main OpenCL implementation problem is poor performance compared to CUDA, but with each new driver release, the performance of OpenCL under CUDA is getting closer to the performance of CUDA applications. According to the developers, the performance of the CUDA applications themselves has gone the same way, from relatively low in the early versions of drivers to impressive at the present time.

What was AMD doing at that moment? After all, it was AMD (as a supporter of open standards - closed PhysX vs. open Havoc; expensive Intel Thread Profiler vs. free AMD CodeAnalyst) that made big bets on the new technology, considering that AMD Stream could not at least compete in popularity with NVidia CUDA - I blame that the lag behind Stream from CUDA in technical terms.
In the summer of 2009, AMD made a statement of support and compliance with the OpenCL standard in the new version of Stream SDK. In fact, it turned out that support was implemented only for the CPU. Yes, that's right, it does not contradict anything - OpenCL is a standard for heterogeneous systems and nothing prevents you from running kernel on the CPU, moreover, it is very convenient if there is no other OpenCL device in the system. In this case, the program will continue to work, only slower. Or you can use all the computing power that is available on the computer - both the GPU and the CPU, although in practice this does not make much sense, since the execution time of the kernels that are executed on the CPU will be much longer than those that run on the GPU - speed CPU will become a bottleneck. But for debugging applications, this is more than convenient.
Support for OpenCL for AMD graphics adapters also did not take long - according to the latest reports of the company, the version for graphics chips is now at the stage of confirmation of compliance with the specifications of the standard. After which it will be available to everyone.
Since OpenCL has to work on top of some iron-specific shell, which means that in order for this standard to really become the same for various heterogeneous systems, the corresponding shells (drivers) must be released for both IBM Cell and Intel Larrabie. So far nothing has been heard from these IT giants, so OpenCL remains another development tool for the GPU along with CUDA, Stream and DirectX Compute.

Apple also claims support for OpenCL, which, however, is provided by NVidia CUDA.
Also, third-party developers are currently offering:

OpenTK is a wrapper library over OpenGL, OpenAL, and OpenCL for .Net.
PyOpenCL - a wrapper over OpenCL for Pyton.
Java wrapper for OpenCL.

Conclusion

OpenCL technology is of interest to various IT companies - from game developers to chip manufacturers, which means that it has a great chance to become the de facto standard for developing high-performance computing by taking this title from the leading CUDA in this sector.

In the future I plan a more detailed article on OpenCL itself, describing what this technology is, its features, advantages and disadvantages.
Thanks for attention.

Interesting links:

www.khronos.org/opencl - the page about OpenCL on the Khronos Group website
www.nvidia.com/object/cuda_opencl.html - NVidia OpenCL (at the bottom of the page links to various documents: OpenCL Programming Guide etc.)
forums.amd.com/devforum /categories.cfm?catid=390&entercat=y - AMD OpenCL forum
habrahabr.ru/blogs/CUDA/55566 - a small overview of OpenCL on
developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx - AMD Stream SDK page v2.0 beta
ati.amd.com/technology/streamcomputing/gpgpu_history.html - AMD GPGPU development history

Tags: