Sequential and parallel versions give different results - Why?

情到浓时终转凉″ 提交于 2020-01-15 05:24:12

问题


I have a nested loop: (L and A are fully defined inputs)

    #pragma omp parallel for schedule(guided) shared(L,A) \
    reduction(+:dummy)
    for (i=k+1;i<row;i++){
            for (n=0;n<k;n++){
                #pragma omp atomic
                dummy += L[i][n]*L[k][n];
                L[i][k] = (A[i][k] - dummy)/L[k][k];
            }
            dummy = 0;
    }

And its sequential version:

    for (i=k+1;i<row;i++){
            for (n=0;n<k;n++){
                dummy += L[i][n]*L[k][n];
                L[i][k] = (A[i][k] - dummy)/L[k][k];
            }
            dummy = 0;
    }

They both give different results. And parallel version is much slower than the sequential version.

What may cause the problem?

Edit:

To get rid of the problems caused by the atomic directive, I modified the code as follows:

#pragma omp parallel for schedule(guided) shared(L,A) \
    private(i)
    for (i=k+1;i<row;i++){
        double dummyy = 0;
        for (n=0;n<k;n++){
            dummyy += L[i][n]*L[k][n];
            L[i][k] = (A[i][k] - dummyy)/L[k][k];
        }
    }

But it also didn't work out the problem. Results are still different.


回答1:


The difference in results comes from the inner loop variable n, which is shared between threads, since it is defined outside of the omp pragma.

Clarified: The loop variable n should be declared inside the omp pragma, since it should be thread-specific, for example for (int n = 0;.....)




回答2:


I am not very familiar with OpenMP but it seems to me that your calculations are not order-independent. Namely, the result in the inner loop is written into L[i][k] where i and k are invariants for the inner loop. This means that the same value is overwritten k times during the inner loop, resulting in a race condition.

Moreover, dummy seems to be shared between the different threads, so there might be a race condition there too, unless your pragma parameters somehow prevent it.

Altogether, to me it looks like the calculations in the inner loop must be performed in the same sequential order, if you want the same result as given by the sequential execution. Thus only the outer loop can be parallelized.




回答3:


In your parallel version you've inserted an unnecessary (and possibly harmful) atomic directive. Once you've declared dummy to be a reduction variable OpenMP takes care of stopping the threads interfering in the reduction. I think the main impact of the unnecessary directive is to slow your code down, a lot.

I see you have another answer addressing the wrongness of your results. But I notice that you seem to set dummy to 0 at the end of each outer loop iteration, which seems strange if you are trying to use it as some kind of accumulator, which is what the reduction clause suggests. Perhaps you want to reduce to dummy across the inner loop ?

If you are having problems with reduction read this.



来源:https://stackoverflow.com/questions/10052867/sequential-and-parallel-versions-give-different-results-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!