CUDA: Getting Started

    This is the first article in a series of articles about using GPGPU and nVidia CUDA. I plan to write not very voluminously so as not to bore the readers too much, but often enough.

    I assume that the reader is aware of what CUDA is, if not, then the introductory article can be found on Habré.

    What you need to work:

    1. A video card from the nVidia GeForce 8xxx / 9xxx series or more modern
    2. CUDA Toolkit v.2.1 (download can be found here: )
    3. CUDA SDK v.2.1 (you can download it in the same place Toolkit)
    4. Visual Studio 2008
    5. CUDA Visual Studio Wizard (download here: )

    Creating a CUDA project:

    After installing everything you need in VS, a new kind of project for C ++ with the name CU-DA WinApp will appear, this is exactly what we need. In this type of project, additional settings for CUDA are available that allow you to configure compilation options for the GPU, for example, the version of Compute Capability depending on the type of GPU, etc.
    I usually create an empty project, because the Precompiled Headers are unlikely to be useful for CUDA.
    It is important to note how the CUDA application is built. Files with the * .cpp extension are processed by the MS C ++ compiler (cl.exe), and files with the * .cu extension by the CUDA compiler (nvcc.exe), which in turn determines which code will work on the GPU and which on the CPU. The code from * .cu, running on the CPU, is transferred to MS C ++ compilation, this feature is convenient to use for writing dynamic libraries that will export functions that use GPUs for calculations.
    The following is a listing of a simple CUDA program that displays information about the hardware capabilities of the GPU.

    Listing. CudaInfo program.



    int main()
      int deviceCount;
      cudaDeviceProp deviceProp;

      //Сколько устройств CUDA установлено на PC.

      printf("Device count: %d\n\n", deviceCount);

      for (int i = 0; i < deviceCount; i++)
        //Получаем информацию об устройстве
        cudaGetDeviceProperties(&deviceProp, i);

        //Выводим иформацию об устройстве
        printf("Device name: %s\n",;
        printf("Total global memory: %d\n", deviceProp.totalGlobalMem);
        printf("Shared memory per block: %d\n", deviceProp.sharedMemPerBlock);
        printf("Registers per block: %d\n", deviceProp.regsPerBlock);
        printf("Warp size: %d\n", deviceProp.warpSize);
        printf("Memory pitch: %d\n", deviceProp.memPitch);
        printf("Max threads per block: %d\n", deviceProp.maxThreadsPerBlock);
        printf("Max threads dimensions: x = %d, y = %d, z = %d\n",
        printf("Max grid size: x = %d, y = %d, z = %d\n",

        printf("Clock rate: %d\n", deviceProp.clockRate);
        printf("Total constant memory: %d\n", deviceProp.totalConstMem);
        printf("Compute capability: %d.%d\n", deviceProp.major, deviceProp.minor);
        printf("Texture alignment: %d\n", deviceProp.textureAlignment);
        printf("Device overlap: %d\n", deviceProp.deviceOverlap);
        printf("Multiprocessor count: %d\n", deviceProp.multiProcessorCount);

        printf("Kernel execution timeout enabled: %s\n",
          deviceProp.kernelExecTimeoutEnabled ? "true" : "false");

      return 0;

    * This source code was highlighted with Source Code Highlighter.

    In the program, I connect the cuda_runtime_api.h library. Although this is not necessary, it will be included automatically, but IntelliSence will not work without it (although it still occasionally mows).


    I think this is the easiest way to write CUDA programs, since it takes a minimum of effort to configure and configure the environment, the only problem is using IntelliSence only.
    Next time, we will consider the use of CUDA for mathematical calculations and issues of working with the memory of a video card.

    PS Ask questions.

    Also popular now: