Parallelizing a Breadth-First Search

I just taught myself some OpenMP and this could be stupid. Basically I'm trying to parallelize a breadth first search program in c++, with each node taking a long time to process. Here's an example code:

queue<node*> q;
q.push(head);
while (!q.empty()) {
  qSize = q.size();
  for (int i = 0; i < qSize; i++) {
    node* currNode = q.front();
    q.pop();
    doStuff(currNode);
    q.push(currNode);
  }
}

The processing function doStuff() is quite expensive and I want to parallelize it. However if I parallelize the for loop by putting #pragma omp parallel for right before the for line, all kinds of weird error pop up at runtime. I'm guessing the reason is that this way q.front() and q.push() will also get parallelized, and multiple threads will possibly get the same node through q.front() (because they all got processed before any q.push has been processed).

How can I get around this?

The solution is to protect access to the queue with a critical section.

queue<node*> q;
q.push(head);
while (!q.empty()) {
  qSize = q.size();
  #pragma omp parallel for
  for (int i = 0; i < qSize; i++) {
    node* currNode;
    #pragma omp critical
    {
      currNode = q.front();
      q.pop();
    }
    doStuff(currNode);
    #pragma omp critical
    q.push(currNode);
  }
}

This is similar to having a common mutex and locking it.

There are some limits in efficiency with this version: At the end of the for loop, some threads may idle, despite work being in the queue. Making a version where threads continuously work whenever there is something in the queue is a bit tricky in terms of handling the situations where the queue is empty but some threads are still computing.

Depending of the data size involved in a node, you may also have significant performance impact of cache-effects and false sharing. But that can't be discussed with a specific example. The simple version will probably be sufficiently efficient in many cases, but getting optimal performance may become arbitrarily complex.

In any case, you have to ensure that doStuff does not do modify any global or shared state.

来源：https://stackoverflow.com/questions/44082755/parallelizing-a-breadth-first-search

标签

c++

parallel-processing

openmp

breadth-first-search