Parallelizing recursive functions using the OpenMP 3.0 task
I recently stumbled upon the blog “ OpenMP 3.0 tasking model: as smooth as Cilk? " After that, I decided to check how well recursive functions parallelize with the OpenMP 3.0 task. Let me remind you that before the third standard there was no support for dynamic or irregular parallelism (for example, loops with while or recursive functions).
Computing Systems
I'll start with a description of the computing systems that I used.
Below is a list of computers on which I can cover the results. The rest have not yet gone into production.
Parallelization process
For testing, I took an example of calculating Fibonacci numbers . The idea of parallelization was as follows: create your asynchronous task for each step of the recursion, and then organize synchronization before exiting the function. Unfortunately, with this approach, I did not get acceleration more than 1. Due to the dominance of runtime costs. The solution was found here and turned out to be simple: create tasks only for part of the recursion steps, and execute the rest in sequence. In other words, balance runtime costs and runtime. Below is a piece of code that is responsible for balancing: In this example, SPLITTER is just responsible for balancing. All source code can be downloaded here .
INT_TYPE
fib( INT_TYPE n ) {
INT_TYPE i, j;
if( n < SPLITTER ) {
return fib_serial( n );
} else {
#pragma omp task shared( i )
i = fib( n - 1 );
#pragma omp task shared( j )
j = fib( n - 2 );
#pragma omp taskwait
return i + j;
}
}
But not so simple. For different SPLITTER values, the parallelization efficiency is different. By poking a finger at the sky, I selected two values of the splitter 19 and 32 at which the parallelization efficiency approaches unity.
Compilation
Of all the software, only compilers will be of interest. I had three compilers at my disposal: Intel® Composer XE for Linux, Microsoft Visual C ++ 2010, and GCC. Visual C ++ is immediately swept away, as it does not support OpenMP 3.0 (at the moment there is only support for OpenMP2.5). GCC supports OpenMP 3.0 since version 4.4, I used 4.5.0.
Below are compilation times and sizes of executable files with both dynamic libraries and static ones. But it is worth noting that, unlike GCC, Intel® Composer XE is installed on a remote machine. Which leads to an increase in compilation time.
The lines for GCC and Intel® Composer XE are as follows:
# gcc ./fib_task.c -fopenmp
# gcc ./fib_task.c -fopenmp -static
# icc ./fib_task.c -openmp -static
# icc ./fib_task.c -openmp
Runtime and Scalability
All measurements were performed to search for the 50th Fibonacci number for SPLITTER = 19 and SPLITTER = 32.
Parallelization acceleration and efficiency
This graph shows the acceleration for SPLITTER = 32. In turn, efficiency is the ratio of the number of threads and acceleration.
Source and executable files
As mentioned above, the source file can be downloaded here . In this example, the 41st Fibonacci number is calculated. To play measurements, you need to change the variable n = 50 and the variable expected = 12586269025UL. You can also immediately download
- executable files generated with Intel® Composer XE for SPLITTER = 19 and SPLITTER = 32 ;
- executable files generated with GCC 4.5.0 for SPLITTER = 19 and SPLITTER = 32 .
conclusions
Actually, charts with runtimes and accelerations all speak for themselves.
- For Composer XE, the parallelization efficiency is ~ 1 until the number of threads does not exceed the number of physical cores.
- For GCC 4.5.0, the parallelization efficiency is ~ 0.5 until the number of threads does not exceed the number of physical cores.
The rest of the conclusions, I suggest you, do it yourself, as everyone is interested in his own.
Afterword
I found an article on PringerLink " An Extension to Improve OpenMP Tasking Control ", which proposed the introduction of a new final clause , which acts as a balancer. In my opinion, this is a sober proposal. And you can add clause-balancers, which in realtime will produce adaptive balancing.
Please refer to the Optimization Notification page for more information on performance and optimization in Intel software products.