I\'m currently implementing an dynamic programming algorithm for solving knapsack problems. Therefore my code has two for-loops, an outer and an inner loop.
From th
I think the simple reason is that since you place your #pragma omp parallel at a outter scope level (second version), the overhead for calling threads is less consuming.
In other terms, in the first version, you call threads creation in the first loop
And I do not know why !itemRows time whereas in the second version, you call the creation only once.
I have tried reproduce a simple example to illustrate that, using 4 threads with HT enabled :
#include
#include
#include
#include
int main()
{
std::vector v(10000);
std::generate(v.begin(), v.end(), []() { static double n{0.0}; return n ++;} );
double start = omp_get_wtime();
#pragma omp parallel // version 2
for (auto& el : v)
{
double t = el - 1.0;
// #pragma omp parallel // version 1
#pragma omp for
for (size_t i = 0; i < v.size(); i ++)
{
el += v[i];
el-= t;
}
}
double end = omp_get_wtime();
std::cout << " wall time : " << end - start << std::endl;
// for (const auto& el : v) { std::cout << el << ";"; }
}
Comment/uncomment according to the version you want. If you compile with : -std=c++11 -fopenmp -O2 you should see that the version 2 is faster.
Demo on Coliru
Live Version 1 wall time : 0.512144
Live version 2 wall time : 0.333664