OpenMP C++ - How to parallelize this function?

蓝咒 提交于 2019-12-06 06:18:32

问题


I'd like to parallelize this function but I'm new with open mp and I'd be grateful if someone could help me :

void my_function(float** A,int nbNeurons,int nbOutput, float* p, float* amp){
   float t=0;
   for(int r=0;r<nbNeurons;r++){
      t+=p[r];
   }

   for(int i=0;i<nbOutput;i++){
      float coef=0;
      for(int r=0;r<nbNeurons;r++){
       coef+=p[r]*A[r][i];
      }
   amp[i]=coef/t;
   }
}

I don't know how to parallelize it properly because of the double loop for, for the moment, I only thought about doing a : #pragma omp parallel for reduction(+:t)

But I think it is not the best way to get the computing faster through openMp.

Thank in advance,


回答1:


First of all: we need to know context. Where does your profiler tell you the most time is spent?

In general, coarse grained parallellization works best, so as @Alex said: parallellize the outer for loop.

void my_function(float** A,int nbNeurons,int nbOutput, float* p, float* amp)
{
    float t=0;
    for(int r=0;r<nbNeurons;r++)
        t+=p[r];

#pragma parallel omp for 
    for(int i=0;i<nbOutput;i++){
        float coef=0;
        for(int r=0;r<nbNeurons;r++){
            coef+=p[r]*A[r][i];
        }
        amp[i]=coef/t;
    }
}

Depending on the actual volumes, it may be interesting to calculate t in the background, and move the division out of the parallel loop:

void my_function(float** A,int nbNeurons,int nbOutput, float* p, float* amp)
{
    float t=0;
#pragma omp parallel shared(amp)
    {
#pragma omp single nowait // only a single thread executes this
        {
            for(int r=0;r<nbNeurons;r++)
                t+=p[r];
        }

#pragma omp for 
        for(int i=0;i<nbOutput;i++){
            float coef=0;
            for(int r=0;r<nbNeurons;r++){
                coef+=p[r]*A[r][i];
            }
            amp[i]=coef;
        }

#pragma omp barrier
#pragma omp master // only a single thread executes this
        {
            for(int i=0; i<nbOutput; i++){
                amp[i] /= t;
            }
        }
    }
}

Note untested code. OMP has tricky semantics sometimes, so I might have missed a 'shared' declaration there. Nothing a profiler won't quickly notify you about, though.



来源:https://stackoverflow.com/questions/12143911/openmp-c-how-to-parallelize-this-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!