Concurrent Bubble Sort on CUDA

  • Tutorial
Hello, Habr. I thought someone would need parallel sorting with a relatively simple implementation and high performance on the CUDA platform. This is bubble sorting. Under the cat is an explanation and code that may come in handy (or maybe not ...). I must say right away that the presented program is a benchmark in comparison of performance on the GPU and CPU. If you do not mind the reader, then compile it, please, and put the calculation results in the comments of this article. This is not for science. Just interesting =)


Let me remind you a bit


The essence of the bubble sorting method is pairwise comparing the elements and replacing them in the event that a certain condition is met, depending on the direction of sorting. The iterations of the algorithm in single-threaded mode are shown below, when it sorts an increasing array of integers in descending order (each line is the state of the array at the moment) - this is table 1.

Table 2 shows the iterations of sorting by a bubble in multi-threaded mode, which is implemented very simply: while one element moves to the side, then after 2 movements the next element can also be sorted by the bubble method and begins to move, and after another 2 - another, etc. Thus, the entire array begins to “pop up”, which significantly reduces the calculation time.

Profit


Already by the number of actions taken, you can see that parallel sorting is faster. But this is not always the case: with small sizes of arrays, single-threaded mode is faster due to the absence of problems with data transfer. I did some test calculations here, and here is their result:
image

The abscissa axis is the number of elements, the ordinates are the execution time in seconds.


Implementation


I give an implementation code in the CUDA C language, which contains the code that performs SECURITY SECURITY sorting on CUDA and CPU (in single-threaded mode). In fact, this is a benchmark for your computer about how smartly sorting works on the video card and on the process. Part of this program is the sorting function on a video card on the CUDA platform, which you can use on health in your masterpieces with elementary copy'n'past. The code is presented below the tables.

How to compile it ???


Well, for this you need to download the drivers for CUDA from the Nvidia website, install them, download the Nvidia Toolkit, install and the Nvidia SDK, also install. Fortunately, now everything is already in one package. I will write how to install CUDA if there is a demand, but I just wish good luck if you are a beginner.
When everything is ready and nvidia CUDA works on your computer, then just compile the main.cu file (or whatever it was called there) and run the program, get the result and put it here.
Wat and that's it for now, thanks for reading. See you later.
PS I would be grateful if you tell me how to speed up the algorithm =)

Table 1: Single Sort Bubble Sort


012345678910eleven12thirteen14
102345678910eleven12thirteen14
120345678910eleven12thirteen14
123045678910eleven12thirteen14
123405678910eleven12thirteen14
123450678910eleven12thirteen14
123456078910eleven12thirteen14
123456708910eleven12thirteen14
123456780910eleven12thirteen14
123456789010eleven12thirteen14
123456789100eleven12thirteen14
12345678910eleven012thirteen14
12345678910eleven120thirteen14
12345678910eleven12thirteen0
12345678910eleven12thirteen140
12345678910eleven12thirteen14
21345678910eleven12thirteen140
23145678910eleven12thirteen140
23415678910eleven12thirteen140
23451678910eleven12thirteen140
23456178910eleven12thirteen140
23456718910eleven12thirteen140
23456781910eleven12thirteen140
23456789110eleven12thirteen140
23456789101eleven12thirteen140
2345678910eleven112thirteen140
2345678910eleven121thirteen140
2345678910eleven12thirteen1140
2345678910eleven12thirteen141
2345678910eleven12thirteen14
3245678910eleven12thirteen1410
3425678910eleven12thirteen1410
3452678910eleven12thirteen1410
3456278910eleven12thirteen1410
3456728910eleven12thirteen1410
3456782910eleven12thirteen1410
3456789210eleven12thirteen1410
3456789102eleven12thirteen1410
345678910eleven212thirteen1410
345678910eleven122thirteen1410
345678910eleven12thirteen21410
345678910eleven12thirteen14210
345678910eleven12thirteen14
435678910eleven12thirteen14210
453678910eleven12thirteen14210
456378910eleven12thirteen14210
456738910eleven12thirteen14210
456783910eleven12thirteen14210
456789310eleven12thirteen14210
456789103eleven12thirteen14210
45678910eleven312thirteen14210
45678910eleven123thirteen14210
45678910eleven12thirteen314210
45678910eleven12thirteen143210
45678910eleven12thirteen1410
54678910eleven12thirteen143210
56478910eleven12thirteen143210
56748910eleven12thirteen143210
56784910eleven12thirteen143210
56789410eleven12thirteen143210
56789104eleven12thirteen143210
5678910eleven412thirteen143210
5678910eleven124thirteen143210
5678910eleven12thirteen4143210
5678910eleven12thirteen1443210
5678910eleven12thirteen14210
6578910eleven12thirteen1443210
6758910eleven12thirteen1443210
6785910eleven12thirteen1443210
6789510eleven12thirteen1443210
6789105eleven12thirteen1443210
678910eleven512thirteen1443210
678910eleven125thirteen1443210
678910eleven12thirteen51443210
678910eleven12thirteen14543210
678910eleven12thirteen143210
768910eleven12thirteen14543210
786910eleven12thirteen14543210
789610eleven12thirteen14543210
789106eleven12thirteen14543210
78910eleven612thirteen14543210
78910eleven126thirteen14543210
78910eleven12thirteen614543210
78910eleven12thirteen146543210
78910eleven12thirteen1443210
87910eleven12thirteen146543210
89710eleven12thirteen146543210
89107eleven12thirteen146543210
8910eleven712thirteen146543210
8910eleven127thirteen146543210
8910eleven12thirteen7146543210
8910eleven12thirteen1476543210
8910eleven12thirteen14543210
9810eleven12thirteen1476543210
9108eleven12thirteen1476543210
910eleven812thirteen1476543210
910eleven128thirteen1476543210
910eleven12thirteen81476543210
910eleven12thirteen14876543210
910eleven12thirteen146543210
109eleven12thirteen14876543210
10eleven912thirteen14876543210
10eleven129thirteen14876543210
10eleven12thirteen914876543210
10eleven12thirteen149876543210
10eleven12thirteen1476543210
eleven1012thirteen149876543210
eleven1210thirteen149876543210
eleven12thirteen10149876543210
eleven12thirteen14109876543210
eleven12thirteen14876543210
12eleventhirteen14109876543210
12thirteeneleven14109876543210
12thirteen14eleven109876543210
12thirteen149876543210
thirteen1214eleven109876543210
thirteen1412eleven109876543210
thirteen14109876543210
14thirteen12eleven109876543210


Table 2: Bubble Sort in Multithreaded Mode


012345678910eleven12thirteen14
012345678910eleven12thirteen14
012345678910eleven12thirteen14
102345678910eleven12thirteen14
120345678910eleven12thirteen14
213045678910eleven12thirteen14
231405678910eleven12thirteen14
324150678910eleven12thirteen14
342516078910eleven12thirteen14
435261708910eleven12thirteen14
453627180910eleven12thirteen14
546372819010eleven12thirteen14
564738291100eleven12thirteen14
65748392101eleven012thirteen14
6758493102eleven1120thirteen14
768594103eleven2121thirteen014
78695104eleven3122thirteen1140
8796105eleven4123thirteen21410
897106eleven5124thirteen314210
98107eleven6125thirteen4143210
9108eleven7126thirteen51443210
109eleven8127thirteen614543210
10eleven9128thirteen7146543210
eleven10129thirteen81476543210
eleven1210thirteen914876543210
12eleventhirteen10149876543210
12thirteeneleven14109876543210
thirteen1214eleven109876543210
thirteen1412eleven109876543210
14thirteen12eleven109876543210
14thirteen12eleven109876543210



The code:


#include 
#include 
__global__ void bubbleMove(int *array_device, int N, int step){
  int idx = blockDim.x * blockIdx.x + threadIdx.x;
  if (idx<(N-1)) {
    if (step-2>=idx){
      if (array_device[idx]>>(array_device,N,step);
    cudaThreadSynchronize();
  }
  cudaMemcpy(array_host,array_device,N*sizeof(int),cudaMemcpyDeviceToHost);
  cudaFree(array_device);
}
void bubleSortCPU(int *array_host, int N){
  for (int i = 0; i < N; i++){
    for (int j = 0; j < N-i-1; j++) {
      if (array_host[j]

Also popular now: