OpenMP - Nested for-loop becomes faster when having parallel before outer loop. Why?

后端未结

关注

 2  538

感动是毒 2020-12-14 12:44

I\'m currently implementing an dynamic programming algorithm for solving knapsack problems. Therefore my code has two for-loops, an outer and an inner loop.

From th

2条回答

醉话见心 (楼主)

2020-12-14 12:52
I think the simple reason is that since you place your #pragma omp parallel at a outter scope level (second version), the overhead for calling threads is less consuming.

~~In other terms, in the first version, you call threads creation in the first loop itemRows time whereas in the second version, you call the creation only once.~~ And I do not know why !

I have tried reproduce a simple example to illustrate that, using 4 threads with HT enabled :
```
#include 
#include 
#include 
#include 

int main()
{
    std::vector v(10000);
    std::generate(v.begin(),  v.end(), []() { static double n{0.0}; return n ++;} );

    double start = omp_get_wtime();

    #pragma omp parallel // version 2
    for (auto& el :  v) 
    {
        double t = el - 1.0;
        // #pragma omp parallel // version 1
        #pragma omp for
        for (size_t i = 0; i < v.size(); i ++)
        {
            el += v[i];
            el-= t;
        }
    }
    double end = omp_get_wtime();

    std::cout << "   wall time : " << end - start << std::endl;
    // for (const auto& el :  v) { std::cout << el << ";"; }

}
```
Comment/uncomment according to the version you want. If you compile with : -std=c++11 -fopenmp -O2 you should see that the version 2 is faster.

Demo on Coliru

Live Version 1 wall time : 0.512144

Live version 2 wall time : 0.333664
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...