OpenMP parallel for reduction delivers wrong results

问题

I am working with a signal matrix and my goal is to calculate the sum of all elements of a row. The matrix is represented by the following struct:

typedef struct matrix {
  float *data;
  int rows;
  int cols;
  int leading_dim;
} matrix;

I have to mention the matrix is stored in column-major order (http://en.wikipedia.org/wiki/Row-major_order#Column-major_order), which should explain the formula column * tan_hd.rows + row for retrieving the correct indices.

for(int row = 0; row < tan_hd.rows; row++) {
    float sum = 0.0;
    #pragma omp parallel for reduction(+:sum)
    for(int column = 0; column < tan_hd.cols; column++) {
        sum += tan_hd.data[column * tan_hd.rows + row];
    }
    printf("row %d: %f", row, sum);
}

Without the OpenMP pragma, the delivered result is correct and looks like this:

row 0: 8172539.500000 row 1: 8194582.000000

As soon as I add the #pragma omp... as described above, a different (wrong) result is returned:

row 0: 8085544.000000 row 1: 8107186.000000

In my understanding, reduction(+:sum) creates private copies of sum for each thread, and after completing the loop these partial results are summed up and written back to the global variable sum again. What is it, that I am doing wrong?

I appreciate your suggestions!

回答1:

Use the Kahan summation algorithm

It has the same algorithmic complexity as a naive summation
It will greatly increase the accuracy of a summation, without requiring you to switch data types to double.

By rewriting your code to implement it:

for(int row = 0; row < tan_hd.rows; row++) {
    float sum = 0.0, c = 0.0;
    #pragma omp parallel for reduction(+:sum, +:c)
    for(int column = 0; column < tan_hd.cols; column++) {
        float y = tan_hd.data[column * tan_hd.rows + row] - c;
        float t = sum + y;
        c = (t - sum) - y;
        sum = t;
    }
    sum = sum - c;
    printf("row %d: %f", row, sum);
}

You can additionally switch all float to double to achieve a higher precision, but since your array is a float array, there should only be differences in the number of signficant numbers at the end.

来源：https://stackoverflow.com/questions/18013345/openmp-parallel-for-reduction-delivers-wrong-results

标签

parallel-processing

openmp

reduction