可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Given the following code...

for (size_t i = 0; i < clusters.size(); ++i) {     const std::set<int>& cluster = clusters[i];     // ... expensive calculations ...     for (int j : cluster)         velocity[j] += f(j);  }

...which I would like to run on multiple CPUs/cores. The function f does not use velocity.

A simple #pragma omp parallel for before the first for loop will produce unpredictable/wrong results, because the std::vector<T> velocity is modified in the inner loop. Multiple threads may access and (try to) modify the same element of velocity at the same time.

I think the first solution would be to write #pragma omp atomic before the velocity[j] += f(j);operation. This gives me a compile error (might have something to do with the elements being of type Eigen::Vector3d or velocity being a class member). Also, I read atomic operations are very slow compared to having a private variable for each thread and doing a reduction in the end. So that's what I would like to do, I think.

I have come up with this:

#pragma omp parallel {     // these variables are local to each thread     std::vector<Eigen::Vector3d> velocity_local(velocity.size());     std::fill(velocity_local.begin(), velocity_local.end(), Eigen::Vector3d(0,0,0));      #pragma omp for     for (size_t i = 0; i < clusters.size(); ++i)     {         const std::set<int>& cluster = clusters[i];         // ... expensive calculations ...         for (int j : cluster)             velocity_local[j] += f(j); // save results from the previous calculations     }       // now each thread can save its results to the global variable     #pragma omp critical     {         for (size_t i = 0; i < velocity_local.size(); ++i)             velocity[i] += velocity_local[i];     } }

Is this a good solution? Is it the best solution? (Is it even correct?)

Further thoughts: Using the reduce clause (instead of the critical section) throws a compiler error. I think this is because velocity is a class member.

I have tried to find a question with a similar problem, and this question looks like it's almost the same. But I think my case might differ because the last step includes a for loop. Also the question whether this is the best approach still holds.

Edit: As request per comment: The reduction clause...

    #pragma omp parallel reduction(+:velocity)     for (omp_int i = 0; i < velocity_local.size(); ++i)         velocity[i] += velocity_local[i];

...throws the following error:

error C3028: 'ShapeMatching::velocity' : only a variable or static data member can be used in a data-sharing clause

(similar error with g++)

回答1:

You're doing an array reduction. I have described this several times (e.g. reducing an array in openmp and fill histograms array reduction in parallel with openmp without using a critical section). You can do this with and without a critical section.

You have already done this correctly with a critical section (in your recent edit) so let me describe how to do this without a critical section.

std::vector<Eigen::Vector3d> velocitya; #pragma omp parallel {     const int nthreads = omp_get_num_threads();     const int ithread = omp_get_thread_num();     const int vsize = velocity.size();      #pragma omp single     velocitya.resize(vsize*nthreads);     std::fill(velocitya.begin()+vsize*ithread, velocitya.begin()+vsize*(ithread+1),                Eigen::Vector3d(0,0,0));      #pragma omp for schedule(static)     for (size_t i = 0; i < clusters.size(); i++) {         const std::set<int>& cluster = clusters[i];         // ... expensive calculations ...         for (int j : cluster) velocitya[ithread*vsize+j] += f(j);     }       #pragma omp for schedule(static)     for(int i=0; i<vsize; i++) {         for(int t=0; t<nthreads; t++) {             velocity[i] += velocitya[vsize*t + i];         }     } }

This method requires extra care/tuning due to false sharing which I have not done.

As to which method is better you will have to test.

文章来源: OpenMP/C++: Parallel for loop with reduction afterwards - best practice?

标签

openmp

velocity