Parallelizing many nested for loops in openMP c++

会有一股神秘感。 提交于 2021-01-28 11:14:38


Hi i am new to c++ and i made a code which runs but it is slow because of many nested for loops i want to speed it up by openmp anyone who can guide me. i tried to use '#pragma omp parallel' before ip loop and inside this loop i used '#pragma omp parallel for' before it loop but it does not works

    #pragma omp parallel
    for(int ip=0; ip !=nparticle; ip++){
        double para[7]={0,0,Vz,x0-xp,y0-yp,z0-zp,0};
        if(ip>=0 && ip<=43){
             #pragma omp parallel for
             for(int it=0;it<NT;it++){  
                for(int ix=0;ix<NumX;ix++){
                    for(int iy=0;iy<NumY;iy++){
                        for(int iz=0;iz<NumZ;iz++){
                            int position=it*NumX*NumY*NumZ+ix*NumY*NumZ+iy*NumZ+iz;
                            MagX[position] +=chg*Field[3*position];
                            MagY[position] +=chg*Field[3*position+1];
                            MagZ[position] +=chg*Field[3*position+2];
    }enter code here

and my rotation function also has infinite integration for loop as given below

for(int i=1;;i++){
    gsl_integration_qag(&F, 10*i, 10*i+10, 1.0e-8, 1.0e-8, 100, 2, w, &temp, &error);

i am using gsl libraries as well. so how to speed up this process or how to make openmp?


If you don't have inter-loop dependences, you can use the collapse keyword to parallelize multiple loops altoghether. Example:

void scale( int N, int M, float A[N][M], float B[N][M], float alpha ) {
  #pragma omp for collapse(2)
  for( int i = 0; i < N; i++ ) {
    for( int j = 0; j < M; j++ ) {
      A[i][j] = alpha * B[i][j];

I suggest you to check out the OpenMP C/C++ cheat sheet (PDF), which contain all the specifications for loop parallelization.


Do not set parallel pragmas inside another parallel pragma. You might overhead the machine creating more threads than it can handle. I would establish the parallelization in the outter loop (if it is big enough):

#pragma omp parallel for
    for(int ip=0; ip !=nparticle; ip++)

Also make sure you do not have any race condition between threads (e.g. RAW).

Advice: if you do not get a great speed-up, a good practice is iterating by chunks and not only by one increment. For instance:

int num_threads = 1;
#pragma omp parallel
#pragma omp single
        num_threads = omp_get_num_threads();
int chunkSize = 20; //Define your own chunk here
for (int position = 0; position < total; position+=(chunkSize*num_threads)) {
    int endOfChunk = position + (chunkSize*num_threads);
    #pragma omp parallel for
    for(int ip = position; ip < endOfChunk ; ip += chunkSize) {

