OpenMP with 1 thread slower than sequential version

问题

I have implemented knapsack using OpenMP (gcc version 4.6.3)

#define MAX(x,y)   ((x)>(y) ? (x) : (y))
#define table(i,j)    table[(i)*(C+1)+(j)]

   for(i=1; i<=N; ++i) {
#pragma omp parallel for
      for(j=1; j<=C; ++j) {
         if(weights[i]>j) {
            table(i,j) = table(i-1,j);
         }else {
            table(i,j) = MAX(profits[i]+table(i-1,j-weights[i]), table(i-1,j));
         }
      }
   }

execution time for the sequential program = 1s

execution time for the openmp with 1 thread = 1.7s (overhead = 40%)

Used the same compiler optimization flags (-O3) in the both cases.

Can someone explain the reason behind this behavior.

Thanks.

回答1:

Enabling OpenMP inhibits certain compiler optimisations, e.g. it could prevent loops from being vectorised or shared variables from being kept in registers. Therefore OpenMP-enabled code is usually slower than the serial and one has to utilise the available parallelism to offset this.

That being said, your code contains a parallel region nested inside the outer loop. This means that the overhead of entering and exiting the parallel region is multiplied N times. This only makes sense if N is relatively small and C is significantly larger (like orders of magnitude larger) than N, therefore the work being done inside the region greatly outweighs the OpenMP overhead.

来源：https://stackoverflow.com/questions/21999756/openmp-with-1-thread-slower-than-sequential-version

标签

performance

openmp

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!