问题
OpenMP 4.5+ provides the capability to do vector/array reductions in C++ (press release)
Using said capability allows us to write, e.g.:
#include <vector>
#include <iostream>
int main(){
std::vector<int> vec;
#pragma omp declare reduction (merge : std::vector<int> : omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))
#pragma omp parallel for default(none) schedule(static) reduction(merge: vec)
for(int i=0;i<100;i++)
vec.push_back(i);
for(const auto x: vec)
std::cout<<x<<"\n";
return 0;
}
The problem is, upon executing such code, the results of the various threads may be ordered in any which way.
Is there a way to enforce order such that thread 0's results preceed thread 1's, and so on?
回答1:
The order of a reduction is explicitly not specified. ("The location in the OpenMP program at which the values are combined and the order in which the values are combined are unspecified.", 2.15.3.6 in OpenMP 4.5). Therefore you cannot use a reduction.
One way would be to use ordered as follows:
std::vector<int> vec;
#pragma omp parallel for default(none) schedule(static) shared(vec)
for(int i=0;i<100;i++) {
// do some computations here
#pragma omp ordered
vec.push_back(i);
}
Note that vec
is now shared, and ordered
implies a serialization of execution and synchronization among threads. This can be very bad for performance except if each of your computations require a significant and uniform amount of time.
You can make a custom ordered reduction. Split the parallel
region from for
loop and manually insert the local results in a sequential order.
std::vector<int> global_vec;
#pragma omp parallel
{
std::vector<int> local_vec;
#pragma omp for schedule(static)
for (int i=0; i < 100; i++) {
// some computations
local_vec.push_back(i);
}
for (int t = 0; t < omp_get_num_threads(); t++) {
#pragma omp barrier
if (t == omp_get_thread_num()) {
global_vec.insert(local_vec.begin(), local_vec.end())
}
}
}
来源:https://stackoverflow.com/questions/44538200/how-to-do-an-ordered-reduction-in-openmp