Use OpenMP to find minimum for sets in parallel, C++

匿名 (未验证) 提交于 2019-12-03 10:24:21

问题:

I'm implementing Boruvka's algorithm in C++ to find minimum spanning tree for a graph. This algorithm finds a minimum-weight edge for each supervertex (a supervertex is a connected component, it is simply a vertex in the first iteration) and adds them into the MST. Once an edge is added, we update the connected components and repeat the find-min-edge, and merge-supervertices process, until all the vertices in the graph are in one connected component.

Since find-min-edge for each supervertex can be done in parallel, I want to use OpenMP to do this. Here is the omp for loop I would like to use for parallel find-min.

int index[NUM_VERTICES]; #pragma omp parallel private(nthreads, tid, index, min) shared(minedgeindex, setcount, forest, EV, umark) { #pragma omp for   for(int k = 0; k < setcount; k++){  //iterate over supervertices, omp for here          min = 9999;         std::fill_n(index, NUM_VERTICES, -1);      /* Gets minimum edge for each supervertex */     for(int i = 0; i < NUM_VERTICES; i++) {          if(forest[i]->mark == umark[k]){    //find vertices with mark k             for(int j = 0; j < NUM_EDGES; j++) {     //check min edge for each vertex in the supervertex k                 if(EV[j].v1==i){                     if(Find(forest[EV[j].v1])!= Find(forest[EV[j].v2])){                             if(EV[j].w <= min ){                                     min = EV[j].w;                                     index[k] = j;                                     break;  //break looping over edges for current vertex i, go to next vertex i+1                             }                     }                 }             }          }      }   //end finding min disjoint-connecting edge for the supervertex with mark k          if(index[k] != -1){             minedgeindex.insert(minedgeindex.begin(), index[k]);         }      }       //omp for end } 

Since I'm new to OpenMP, I currently cannot make it work as I expected.

Let me briefly explain what I'm doing in this block of code: setcount is the number of supervertices. EV is a vector containing all edges (Edge is a struct I defined previously, has attributes v1, v2, w which correspond to the two nodes it connects and its weight). minedgeindex is a vector, I want each thread to find min edge for each connected component, and add the index (index j in EV) of the min edge to vector minedgeindex at the same time. So I think minedgeindex should be shared. forest is a struct for each vertex, it has a set mark umark indicating which supervertex it's in. I use Union-Find to mark all supervertices, but it is not relevant in this block of omp code.

The ultimate goal I need for this block of code is to give me the correct vector minedgeindex containing all min edges for each supervertex.

To be more clear and ignore the graph background, I just have a large vector of numbers, I separate them into a bunch of sets, then I need some parallel threads to find the min for each set of numbers and give me back the indices for those mins, store them in a vector minedgeindex.

If you need more clarification just ask me. Please help me make this work, I think the main issue is the declaration of private and shared variables which I don't know if I'm doing right.

Thank you in advance!

回答1:

Allocating an array outside of a parallel block and then declaring it private is not going to work.

Edit: After reading through your code again it does not appear that index should even be private. In that case you should just declare it outside the parallel block (as you did) but not declare it private. But I am not sure you even need index to be an array. I think you can just declare it as an private int.

Additionally, you can't fill minedgeindex like you did. That causes a race condition. You need to put it in a critical section. Personally I would try and use push_back and not insert from the beginning of the array since that's inefficient.

Some people prefer to explicitly declare everything shared and private. In standard C you sorta have to do this, at least for private. But for C99/C++ this is not necessary. I prefer to only declare shared/private if it's necessary. Everything outside of the parallel region is shared (unless it's an index used in a parallel loop) and everything inside is private. If you keep that in mind you rarely have to explicitly declare data shared or private.

    //int index[NUM_VERTICES]; //index is shared     //std::fill_n(index, NUM_VERTICES, -1);     #pragma omp parallel     {            #pragma omp for         for(int k = 0; k < setcount; k++){  //iterate over supervertices, omp for here             int min = 9999; // min is private             int index = -1;              //iterate over supervertices              if(index != -1){                 #pragma omp critical                 minedgeindex.insert(minedgeindex.begin(), index);                 //minedgeindex.insert(minedgeindex.begin(), index[k]);             }         }     } 

Now that the code is working here are some suggestions to perhaps speed it up.

Using the critical declaration inside the loop could be very inefficient. I suggest filling a private array (std::vector) and then merging them after the parallel loop (but still in the parallel block). The loop has an implicit barrier which is not necessary. This can be removed with nowait.

Independent of the critical section the time to find each minimum can vary per iteration so you may want to consider schedule(dynamic). The following code does all this. Some variation of these suggestions, if not all, may improve your performance.

#pragma omp parallel {     vector<int> minedgeindex_private;     #pragma omp for schedule(dynamic) nowait     for(int k = 0; k < setcount; k++){  //iterate over supervertices, omp for here         int min = 9999;         int index = -1;          //iterate over supervertices          if(index != -1){             minedgeindex_private.push_back(index);         }     }     #pragma omp critical     minedgeindex.insert(         minedgeindex.end(),         minedgeindex_private.begin(), minedgeindex_private.end()); } 


回答2:

This is not going to work efficiently with openMP, because omp for simply splits the work statically between all threads, i.e. each threads gets a fair share of the supervertices. However, the work per supervertex may be uneven, when the work-sharing between treads not be even.

You can try to use dynamic or guided schedule with openMP, but better is to avoid openMP altogether and use TBB, when tbb::parallel_for() avoids this issue.


OpenMP has several disadvantages: 1) it is pre-processor based 2) it has rather limited functionality (this is what I highlighted above) 3) it isn't standardised for C++ (in particular C++11)

TBB is a pure C++ library (no preprocessor hack) with full C++11 support. For more details, see my answer to this question



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!