Why OpenMP version is slower?

自作多情 提交于 2020-01-01 02:32:09

问题


I am experimenting with OpenMP. I wrote some code to check its performance. On a 4-core single Intel CPU with Kubuntu 11.04, the following program compiled with OpenMP is around 20 times slower than the program compiled without OpenMP. Why?

I compiled it by g++ -g -O2 -funroll-loops -fomit-frame-pointer -march=native -fopenmp

#include <math.h>
#include <iostream>

using namespace std;

int main ()
{
  long double i=0;
  long double k=0.7;

  #pragma omp parallel for reduction(+:i)
  for(int t=1; t<300000000; t++){       
    for(int n=1; n<16; n++){
      i=i+pow(k,n);
    }
  }

  cout << i<<"\t";
  return 0;
}

回答1:


The problem is that the variable k is considered to be a shared variable, so it has to be synced between the threads. A possible solution to avoid this is:

#include <math.h>
#include <iostream>

using namespace std;

int main ()
{
  long double i=0;

#pragma omp parallel for reduction(+:i)
  for(int t=1; t<30000000; t++){       
    long double k=0.7;
    for(int n=1; n<16; n++){
      i=i+pow(k,n);
    }
  }

  cout << i<<"\t";
  return 0;
}

Following the hint of Martin Beckett in the comment below, instead of declaring k inside the loop, you can also declare k const and outside the loop.

Otherwise, ejd is correct - the problem here does not seem bad parallelization, but bad optimization when the code is parallelized. Remember that the OpenMP implementation of gcc is pretty young and far from optimal.




回答2:


Fastest code:

for (int i = 0; i < 100000000; i ++) {;}

Slightly slower code:

#pragma omp parallel for num_threads(1)
for (int i = 0; i < 100000000; i ++) {;}

2-3 times slower code:

#pragma omp parallel for
for (int i = 0; i < 100000000; i ++) {;}

no matter what it is in between { and }. A simple ; or a more complex computation, same results. I compiled under Ubuntu 13.10 64-bit, using both gcc and g++, trying different parameters -ansi -pedantic-errors -Wall -Wextra -O3, and running on an Intel quad-core 3.5GHz.

I guess thread management overhead is at fault? It doens't seem smart for OMP to create a thread everytime you need one and destroy it after. I thought there would be four (or eight) threads being either running whenever needed or sleeping.




回答3:


I am observing similar behavior on GCC. However I am wondering if in my case it is somehow related with template or inline function. Is your code also within template or inline function? Please look here.

However for very short for loops, you may observe some small overhead related with thread switching like in your case:

#pragma omp parallel for
for (int i = 0; i < 100000000; i ++) {;}

If your loop executes for some seriously long time as few ms or even seconds, you should observe performance boost when using OpenMP. But only when you have more than one CPU. The more cores you have, the higher performance you reach with OpenMP.



来源:https://stackoverflow.com/questions/6506987/why-openmp-version-is-slower

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!