find_first of a vector in parallel in C++

时光毁灭记忆、已成空白 提交于 2019-12-31 04:20:30

问题


I have a quite big vector. Some of the vector members are matching a certain condition in parallel. I would like to find the first element matching to the condition.

My problem is very similar to this question (tbb: parallel find first element) , but I do not have tbb. Checking condition is very tedious (so I cannot do it for all of them sequentially). That's why I would like to run it in parallel. I have to mention that I would like to find the first element (so the index position of the element is important for me).

For exmple if I have 4 threads.

ThreadNr   Index      condition
1            0         Not Meet
2            1         Not Meet
3            2         Not Meet
4            3         Not Meet

ThreadNr   Index      condition
1            4         Not Meet
2            5          Meet
3            6         Not Meet
4            7          Meet

The function has to retun index number of 5. Threads have to be distributed and work on sequential iteration block (block size can be more that 1. For instance thread 1 works on the first 4 elements, thread 2 on the second 4 elements and so on).

For above example if thread number 4 (in index 7) found member before thread number 2 (in index 5), it must wait all thread finish the job. As I said before the lowest index number is the target.

Please correct me if you have a better algorithm in mind.

NOTE: I can use external libraries such as boost 1.62, OpenMP 2.0


回答1:


Since OpenMP 2.0 does not have cancellation constructs, you have to implement one on your own, e.g., by using a shared variable. It also means that you cannot use the for worksharing construct as breaking out of parallel loops is not permitted (that's why OpenMP 4.0 introduced cancellation constructs). If you implement cancellation checks between the evaluation of each element, it might happen that two or more threads find elements matching the criterion. Thus, you should perform a min reduction on the index:

int found = 0;
int first_index = INVALID_VALUE;
int iteration = 0;

#pragma omp parallel
{
   int my_index = INVALID_VALUE;
   int i;

   do
   {
      // Later versions of OpenMP allow for "atomic capture"
      // but OpenMP 2.0 requires a critical directive instead
      #pragma omp critical(iteration)
      {
         i = iteration++;
      }

      if (i < N && check(i))
      {
         found = 1;
         my_index = i;
      }
   } while (!found && i < N);

   #pragma omp critical(reduction)
   if (my_index != INVALID_VALUE)
   {
      if (first_index == INVALID_VALUE || my_index < first_index)
         first_index = my_index;
   }

   // Only needed if more code follows before the end of the region
   #pragma omp barrier

   ...
}

This code assumes that checking the condition for the i-th element (check(i)) does not alter the state of the element, and therefore, the worst that could happen is that the thread that has found a matching element might have to wait for all other threads to finish checking the element they currently work on and that waiting time will be the maximum of all processing times.

The critical construct used in the do-loop is expensive. If check() doesn't take that much time, then you might consider working with chunks instead of iterations:

do
{
   #pragma omp critical(chunk)
   {
       my_chunk = chunk++;
   }

   if (my_chunk >= N_chunks)
      break;

   for (i = my_chunk * chunk_size; !found && i < (my_chunk+1)*chunk_size; i++)
   {
      if (check(i))
      {
         found = 1;
         my_index = i;
         break;
      }
   }
} while (!found && my_chunk < N_chunks);

Another solution that works reasonably well when the number of elements is not that big and checking each one is expensive:

#pragma omp parallel
{
   #pragma omp for schedule(dynamic,x)
   for (i = 0; i < N; i++)
   {
      if (!found && check(i))
      {
         my_index = i;
         found = 1;
      }
   }

   // Min reduction from the solution above
   ...
}

Once found becomes true, the rest of the loop iterations will run "empty" bodies because the shortcutting properties of &&.




回答2:


With OpenMP you can try to build a for loop with #pragma omp for schedule(dynamic). Each thread will execute one iteration in same order as your vector. If you want to check 4 elements by thread, try #pragma omp for schedule(dynamic, 4)



来源:https://stackoverflow.com/questions/40285046/find-first-of-a-vector-in-parallel-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!