OpenMP divide for loop over cores

问题

i am trying to execute some application in parrallel using sse instructions and openmp. Concerning the openmp part i have code like:

for(r=0; r<end_condition; r++){
    .. several nested for loops inside ..
}

i want to divide this loop over r over multiple cores, and for example when using two cores one core should execute r=0 .. r=end_condition/2-1 and the other r=end_condition/2 .. r=end_condition-1. There is no communication between iterations of the loop so they can be ran in parallel, at the end of the r loop the results should be synchronized.

How can i divide this over the cores this way using openmp directives? Do i have to unroll the loop over r and use openmp sections?

Thanks in advance

回答1:

you can achieve this by adding:

#pragma omp parallel for
for(r=0; r<end_condition; r++){
    .. several nested for loops inside ..
}

you need to make sure what is shared and what is private in your loop, though. Allthough this does not guarantee that r is divided as you mentioned it. If you want to have it in this explicit way, you could use tasks. But doing this by hand is not really convenient and I cannot recommend it.

回答2:

With the following code the compiler generates a parallel region, which is executed by N threads.

omp_set_num_threads(N);

#pragma omp parallel for
for(int r = 0; r < end_condition; ++r)
{
    .. several nested for loops inside ..
}

Each thread executes a subset from end_condition. Note that your counting variable r is now declared inside the omp parallel for scope. Now each thread has its own counting variable.

The same goal can be achieved using the the parallel pragma, not the parallel for, like this:

omp_set_num_threads(N);
#pragma omp parallel private(r)
{
   int tid = omp_get_thread_num();
   for(r = (end_condition/N) * tid; r < (end_condition/N) * (tid+1) ; ++r)
   {
    .. several nested for loops inside ..
   }
}

of course only when end_condition%N = 0 but you sould get the deal. Here the variable r is explicit marked as private to the thread and can be declared werevere you want. The compiler will generate a copy for each thread.

回答3:

You can set the number of threads that the for loop should create. And for each thread, you can specify the chunk size.

回答4:

I can only add that you might have problems if different iterations of loop take different time - in that case you'd want to add schedule (dynamic):

#pragma omp parallel for schedule (dynamic)
for(r=0; r<end_condition; r++){
    .. several nested for loops inside ..
}

Also, note that barrier is automatically added at the end of the loop so you can be sure execution only continues when all iterations are completed. if that is not desired (you have some other work to do in parallel with a loop) - add nowait to the for directive parameters. You can then request synchronization with #pragma omp barrier.

来源：https://stackoverflow.com/questions/8312563/openmp-divide-for-loop-over-cores

标签

c++

openmp