nested loops, inner loop parallelization, reusing threads

让人想犯罪 __ 提交于 2019-12-04 07:57:04

To sum up the parallelization in OpenMP for this particular case: It is not worth it.

Why? Operations in the inner loops are simple. Code was compiled with -O3, so max() call was probably substituted with the code of function body. Overhead in implicit barrier is probably high enough, to compensate the performance gain, and overall overhead is high enough to make the parallel code even slower than the sequential code was. I also found out, there is no real performance gain in such construct:

#pragma omp parallel private(i,j)
{ 
   for (i = 1; i < n; i++)
   {   
      #pragma omp for
      for (j = 0; j < m; j++)
         x[i][j] = x[i-1][j];
   }
}

because it's performance is similar to this one

for (i = 1; i < n; i++)
{   
   #pragma omp parallel for private(j)
   for (j = 0; j < m; j++)
      x[i][j] = x[i-1][j];
}

thanks to built-in thread reusing in GCC libgomp, according to this article: http://bisqwit.iki.fi/story/howto/openmp/

Since the outer loop cannot be paralellized (without ordered option) it looks there is no way to significantly improve performance of the program in question using OpenMP. If someone feels I did something wrong, and it is possible, I'll be glad to see and test the solution.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!