Reduction with OpenMP: linear merging or log(number of threads) merging
I have a general question about reductions with OpenMP that's bothered me for a while. My question is in regards to merging the partial sums in a reduction. It can either be done linearly or as the log of the number of threads. Let's assume I want to do a reduction of some function double foo(int i) . With OpenMP I could do it like this. double sum = 0.0; #pragma omp parallel for reduction (+:sum) for(int i=0; i<n; i++) { sum += f(i); } However, I claim that the following code will be just as efficient. double sum = 0.0; #pragma omp parallel { double sum_private = 0.0; #pragma omp for nowait