openMP only on inner loop not working

╄→гoц情女王★ 提交于 2019-12-24 06:59:14

问题


This is an update to my original question with a working code and runtimes included.

I have a simple code that does a 2D random walk with multiple walkers over a number of steps. I'm trying to parallelize the walkers into group on each thread with openMP only on the inner loop.

Here is the code. It outputs step number vs root mean square displacement (RMSD). The plot of Step vs RMSD should follow a power law with index around 0.5 as a check on the results (which it does).

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

int main(int argc,char **argv){

  // define variables
  int    i,j;
  double msd,xij,yij,randm;

  // inputs
  // ----------------------
  // set size
  int walkers = 2000;
  int steps   = 50000;
  // ----------------------

  // allocate arrays
  double *xpos = malloc(walkers*sizeof(double));
  double *ypos = malloc(walkers*sizeof(double));
  double *thet = malloc(walkers*steps*sizeof(double));
  int    *step = malloc(steps*sizeof(int));
  double *rmsd = malloc(steps*sizeof(double));

  // initialize
  double dr = 0.2;
  double pi = 4.0*atan(1.0);
  for(i=0; i<walkers; i++){
    xpos[i] = 0.0;
    ypos[i] = 0.0;
  }

  // generate random angles
  srand(time(NULL));
  for(i=0; i<steps; i++){
    for(j=0; j<walkers; j++){
      randm = rand();
      randm = (randm/RAND_MAX)*2.0*pi;
      thet[i*walkers+j] = randm;
    }
  }

  // random walk
  #pragma omp parallel private(i,j,xij,yij)
  for(i=0; i<steps; i++){
    msd = 0.0;
    #pragma omp barrier
    #pragma omp for reduction(+:msd)
    for(j=0; j<walkers; j++){
      xpos[j] += dr*cos(thet[i*walkers+j]);
      ypos[j] += dr*sin(thet[i*walkers+j]);
      xij = xpos[j];
      yij = ypos[j];
      // get displacement
      msd += xij*xij + yij*yij;
    }
    // store values to array
    #pragma omp single
    step[i] = i+1;
    #pragma omp single
    rmsd[i] = sqrt(msd/walkers);
  }

  // write output to file
  FILE *f = fopen("random_walk_c_omp.txt","w");
  for(i=0; i<steps; i++){
    fprintf(f,"%i  %f\n",step[i],rmsd[i]);
  }
  fclose(f);

  // free arrays
  free(xpos);
  free(ypos);
  free(thet);
  free(step);
  free(rmsd);

}

Here are the runtimes.

Serial version compiled without OpenMP:

gcc-5 random_walk_c_omp.c -o random_walk_c_omp -O3 -Wall
time ./random_walk_c_omp
2.55 real         2.36 user         0.18 sys

OpenMP version with OMP_NUM_THREADS=1:

gcc-5 random_walk_c_omp.c -o random_walk_c_omp -O3 -Wall -fopenmp
time ./random_walk_c_omp
2.81 real         2.62 user         0.17 sys

OpenMP version with OMP_NUM_THREADS=4:

gcc-5 random_walk_c_omp.c -o random_walk_c_omp -O3 -Wall -fopenmp
time ./random_walk_c_omp
4.36 real         3.42 user         3.80 sys

I'm no expert with C so feel free to throw stones here, but is there something wrong with this OpenMP implementation that would explain why it's slower than the serial version? My guess is because of the little work load of the inner loop compared to the many iterations of the outer loop.

来源:https://stackoverflow.com/questions/37622315/openmp-only-on-inner-loop-not-working

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!