Why fails the following OpenMP program to reduce my variable?

问题

Consider the following minimal C code example. When compiling and executing with export OMP_NUM_THREADS=4 && gcc -fopenmp minimal.c && ./a.out (GCC 4.9.2 on Debian 8), this produces five lines with rho=100 (sometimes also 200 or 400) on my machine. Expected output is of course rho=400 for all five printed lines.

The program is more likely to produce the correct result if I insert more code at // MARKER or place a barrier just there. But even with another barrier, it sometimes fails and so does my program. So the problem seems to be that a is not properly initialized when going into the reduction loop.

The OpenMP 4.0.0 manual even states on page 55 that there is an implicit barrier at the end of a loop construct unless a nowait clause is specified. So a should be set up at this point. What is going wrong here? Am I missing something?

#include <stdio.h>
#ifdef _OPENMP
#include <omp.h>
#define ID omp_get_thread_num()
#else
#define ID 0
#endif

double a[100];

int main(int argc, char *argv[]) {
    int i;
    double rho;
    #pragma omp parallel
    {
        #pragma omp for
        for (i = 0; i < 100; i++) {
            a[i] = 2;
        }
        // MARKER
        rho = 0.0;
        #pragma omp for reduction(+: rho)
        for (i = 0; i < 100; i++) {
            rho += ((a[i])*(a[i]));
        }
        fprintf(stderr, "[%d] rho=%f\n", ID, rho);
    }
    fprintf(stderr, "[%d] rho=%f\n", ID, rho);
    return 0;
}

回答1:

OK I've got the answer, but I sweat to get it...

This is a race condition due to the fact that rho is shared and that you initialise it inside the parallel region like this rho = 0.0;

Either initialising it outside of the parallel region, or using a #pragma omp single right before will fix the code...

来源：https://stackoverflow.com/questions/33186421/why-fails-the-following-openmp-program-to-reduce-my-variable

标签

openmp