I have seen an algorithm for parallel merge-sort in a this paper. This is the code:
void mergesort_parallel_omp (int a[], int size, int temp[], int threads)
{
if ( threads == 1) { mergesort_serial(a, size, temp); }
else if (threads > 1)
{
#pragma omp parallel sections
{
#pragma omp section
mergesort_parallel_omp(a, size/2, temp, threads/2);
#pragma omp section
mergesort_parallel_omp(a + size/2, size - size/2, temp + size/2, threads - threads/2);
}
merge(a, size, temp);
} // threads > 1
}
I run it on a multicore. What happens is that at the leafs of the tree, 2 threads run in parallel. After they finished their work 2 other threads start and so on. Even if we have free cores for all the leaf nodes.
I think the reason is this OpenMP code does not create parallel regions inside parallel regions. Am I correct?
"I think the reason is that OpenMP cannot create parallel regions inside parallel regions"
You can have parallel region of parallel region.
OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread (source).
In order to run your code correctly, you need to call omp_set_nested(1)
and omp_set_num_threads(2)
.
Nested parallelism can be enabled or disabled by setting the OMP_NESTED environment variable or calling omp_set_nested() function
The modern answer to this question is to use tasks instead of sections. Tasks were added in OpenMP 3.0 (2009) and work better/easier than nested parallelism and sections, because nested parallelism can lead to oversubscription (more active threads than CPUs available), which causes significant performance degradation. With tasks, you have one team of threads matching the number of CPUs and the will work on the tasks. So you do not need the manual handling with the threads
parameter. A simple solution looks like this:
// span parallel region outside once outside
void mergesort_omp(...) {
#pragma omp parallel
#pragma omp single
mergesort_parallel_omp(...)
}
void mergesort_parallel_omp (int a[], int size, int temp[])
{
#pragma omp task
mergesort_parallel_omp(a, size/2, temp);
mergesort_parallel_omp(a + size/2, size - size/2, temp + size/2);
#pragma omp taskwait
merge(a, size, temp);
}
However, it can still be problematic to create tasks for too small chunks of work, so it is useful to limit the parallelism based on the work granularity, e.g. as such:
void mergesort_parallel_omp (int a[], int size, int temp[])
{
if (size < size_threshold) {
mergesort_serial(a, size, temp);
return;
}
#pragma omp task
mergesort_parallel_omp(a, size/2, temp);
mergesort_parallel_omp(a + size/2, size - size/2, temp + size/2);
#pragma omp taskwait
merge(a, size, temp);
}
Maybe I am totally missing the point here... but are you aware that you need to set the environment variable OMP_NUM_THREADS if you want to execute on more than 2 threads?
来源:https://stackoverflow.com/questions/13811114/parallel-merge-sort-in-openmp