OpenMP - Nested for-loop becomes faster when having parallel before outer loop. Why?

后端 未结 2 538
感动是毒
感动是毒 2020-12-14 12:44

I\'m currently implementing an dynamic programming algorithm for solving knapsack problems. Therefore my code has two for-loops, an outer and an inner loop.

From th

2条回答
  •  醉话见心
    2020-12-14 12:52

    I think the simple reason is that since you place your #pragma omp parallel at a outter scope level (second version), the overhead for calling threads is less consuming.

    In other terms, in the first version, you call threads creation in the first loop itemRows time whereas in the second version, you call the creation only once. And I do not know why !

    I have tried reproduce a simple example to illustrate that, using 4 threads with HT enabled :

    #include 
    #include 
    #include 
    #include 
    
    int main()
    {
        std::vector v(10000);
        std::generate(v.begin(),  v.end(), []() { static double n{0.0}; return n ++;} );
    
        double start = omp_get_wtime();
    
        #pragma omp parallel // version 2
        for (auto& el :  v) 
        {
            double t = el - 1.0;
            // #pragma omp parallel // version 1
            #pragma omp for
            for (size_t i = 0; i < v.size(); i ++)
            {
                el += v[i];
                el-= t;
            }
        }
        double end = omp_get_wtime();
    
        std::cout << "   wall time : " << end - start << std::endl;
        // for (const auto& el :  v) { std::cout << el << ";"; }
    
    }
    

    Comment/uncomment according to the version you want. If you compile with : -std=c++11 -fopenmp -O2 you should see that the version 2 is faster.

    Demo on Coliru

    Live Version 1 wall time : 0.512144

    Live version 2 wall time : 0.333664

提交回复
热议问题