问题
This is an update to my original question with a working code and runtimes included.
I have a simple code that does a 2D random walk with multiple walkers over a number of steps. I'm trying to parallelize the walkers into group on each thread with openMP only on the inner loop.
Here is the code. It outputs step number vs root mean square displacement (RMSD). The plot of Step vs RMSD should follow a power law with index around 0.5 as a check on the results (which it does).
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
int main(int argc,char **argv){
// define variables
int i,j;
double msd,xij,yij,randm;
// inputs
// ----------------------
// set size
int walkers = 2000;
int steps = 50000;
// ----------------------
// allocate arrays
double *xpos = malloc(walkers*sizeof(double));
double *ypos = malloc(walkers*sizeof(double));
double *thet = malloc(walkers*steps*sizeof(double));
int *step = malloc(steps*sizeof(int));
double *rmsd = malloc(steps*sizeof(double));
// initialize
double dr = 0.2;
double pi = 4.0*atan(1.0);
for(i=0; i<walkers; i++){
xpos[i] = 0.0;
ypos[i] = 0.0;
}
// generate random angles
srand(time(NULL));
for(i=0; i<steps; i++){
for(j=0; j<walkers; j++){
randm = rand();
randm = (randm/RAND_MAX)*2.0*pi;
thet[i*walkers+j] = randm;
}
}
// random walk
#pragma omp parallel private(i,j,xij,yij)
for(i=0; i<steps; i++){
msd = 0.0;
#pragma omp barrier
#pragma omp for reduction(+:msd)
for(j=0; j<walkers; j++){
xpos[j] += dr*cos(thet[i*walkers+j]);
ypos[j] += dr*sin(thet[i*walkers+j]);
xij = xpos[j];
yij = ypos[j];
// get displacement
msd += xij*xij + yij*yij;
}
// store values to array
#pragma omp single
step[i] = i+1;
#pragma omp single
rmsd[i] = sqrt(msd/walkers);
}
// write output to file
FILE *f = fopen("random_walk_c_omp.txt","w");
for(i=0; i<steps; i++){
fprintf(f,"%i %f\n",step[i],rmsd[i]);
}
fclose(f);
// free arrays
free(xpos);
free(ypos);
free(thet);
free(step);
free(rmsd);
}
Here are the runtimes.
Serial version compiled without OpenMP:
gcc-5 random_walk_c_omp.c -o random_walk_c_omp -O3 -Wall
time ./random_walk_c_omp
2.55 real 2.36 user 0.18 sys
OpenMP version with OMP_NUM_THREADS=1
:
gcc-5 random_walk_c_omp.c -o random_walk_c_omp -O3 -Wall -fopenmp
time ./random_walk_c_omp
2.81 real 2.62 user 0.17 sys
OpenMP version with OMP_NUM_THREADS=4
:
gcc-5 random_walk_c_omp.c -o random_walk_c_omp -O3 -Wall -fopenmp
time ./random_walk_c_omp
4.36 real 3.42 user 3.80 sys
I'm no expert with C so feel free to throw stones here, but is there something wrong with this OpenMP implementation that would explain why it's slower than the serial version? My guess is because of the little work load of the inner loop compared to the many iterations of the outer loop.
来源:https://stackoverflow.com/questions/37622315/openmp-only-on-inner-loop-not-working