问题
I'm sitting with some stuff here trying to make orphaning work, and reduce the overhead by reducing the calls of #pragma omp parallel
.
What I'm trying is something like:
#pragma omp parallel default(none) shared(mat,mat2,f,max_iter,tol,N,conv) private(diff,k)
{
#pragma omp master // I'm not against using #pragma omp single or whatever will work
{
while(diff>tol) {
do_work(mat,mat2,f,N);
swap(mat,mat2);
if( !(k%100) ) // Only test stop criteria every 100 iteration
diff = conv[k] = do_more_work(mat,mat2);
k++;
} // end while
} // end master
} // end parallel
The do_work
depends on the previous iteration so the while-loop is has to be run sequential.
But I would like to be able to run the ´do_work´ parallel, so it would look something like:
void do_work(double *mat, double *mat2, double *f, int N)
{
int i,j;
double scale = 1/4.0;
#pragma omp for schedule(runtime) // Just so I can test different settings without having to recompile
for(i=0;i<N;i++)
for(j=0;j<N;j++)
mat[i*N+j] = scale*(mat2[(i+1)*N+j]+mat2[(i-1)*N+j] + ... + f[i*N+j]);
}
I hope this can be accomplished some way, I'm just not sure how. So any help I can get is greatly appreciated (also if you're telling me this isn't possible). Btw I'm working with open mp 3.0, the gcc compiler and the sun studio compiler.
回答1:
The outer parallel region in your original code contains only a serial piece (#pragma omp master
), which makes no sense and effectively results in purely serial execution (no parallelism). As do_work()
depends on the previous iteration, but you want to run it in parallel, you must use synchronisation. The openmp tool for that is an (explicit or implicit) synchronisation barrier.
For example (code similar to yours):
#pragma omp parallel
for(int j=0; diff>tol; ++j) // must be the same condition for each thread!
#pragma omp for // note: implicit synchronisation after for loop
for(int i=0; i<N; ++i)
work(j,i);
Note that the implicit synchronisation ensures that no thread enters the next j
if any thread is still working on the current j
.
The alternative
for(int j=0; diff>tol; ++j)
#pragma omp parallel for
for(int i=0; i<N; ++i)
work(j,i);
should be less efficient, as it creates a new team of threads at each iteration, instead of merely synchronising.
来源:https://stackoverflow.com/questions/14384959/pragma-omp-for-inside-pragma-omp-master-or-single