Initialize variable for omp reduction

混江龙づ霸主 提交于 2021-02-19 08:40:24

问题


The OpenMP standard specifies an initial value for a reduction variable. So do I have to initialize the variable and how would I do that in the following case:

int sum;
//...
for(int it=0;i<maxIt;i++){
#pragma omp parallel
{
  #pragma omp for nowait
  for(int i=0;i<ct;i++)
    arrayX[i]=arrayY[i];

  sum = 0;
  #pragma omp for reduction(+:sum)
  for(int i=0;i<ct;i++)
    sum+=arrayZ[i];
}
//Use sum
}

Note that I use only 1 parallel region to minimize overhead and to allow the nowait in the first loop. Using this as-is would lead to a data race (IMO) because the threads coming from the first loop after other threads started the 2nd loop will reset sum.
Of course I can do this at the top of the outer loop but in a general case and for large code bases you may forget that you need or had set it there which produces unexpected results.
Does "omp single" help here? I suspect that while thread A executes the single, another thread may already enter the reduction loop. "omp barrier" is possible but I want to avoid that as it defeats the "nowait".

And last another example:

#pragma omp parallel
{
  sum = 0;
  #pragma omp for reduction(+:sum)
  for(int i=0;i<ct;i++)
    sum+=arrayZ[i];
  //Use sum
  sum = 0;
  #pragma omp for reduction(+:sum)
  for(int i=0;i<ct;i++)
    sum+=arrayZ[i];
  //Use sum
}

How would I (re)initialize here?


回答1:


Edit: This answer is wrong as it makes an assumption that is not in the OpenMP specification. As accepted answers cannot be deleted, I'm leaving it here as an example that one should always doubt and validate code and/or statements found on the Internet.

Actually, the code doesn't exhibit data races:

#pragma omp parallel
{
   ...
   sum = 0;
   #pragma omp for reduction(+:sum)
   for(int i=0;i<ct;i++)
     sum+=arrayZ[i];
   ...
}

What happens here is that a private copy of sum is created inside the worksharing construct and is initialised to 0 (the initialisation value for the + operator). Each local copy is updated by the loop body. Once a given thread has finished, it waits at the implicit barrier present at the end of the for construct. Once all threads have reached the barrier, their local copies of sum are summed together and the result is added to the shared value.

It doesn't matter that all threads might execute sum = 0; at different time since its value is only updated once the barrier has been reached. Think of the code above performing something like:

...
sum = 0;
// Start of the for worksharing construct
int local_sum = 0;                     // ^
for(int i = i_start; i < i_end; i++)   // | sum not used here
  local_sum += arrayZ[i];              // v
// Implicit construct barrier
#pragma omp barrier
// Reduction
#pragma omp atomic update
sum += local_sum;
#pragma omp barrier
// End of the worksharing construct
...

The same applies to the second example.




回答2:


The OpenMP specification does not prescribe when and how the original value gets updated and mandates the use of synchronisation (OpenMP, p.205):

To avoid race conditions, concurrent reads or updates of the original list item must be synchronized with the update of the original list item that occurs as a result of the reduction computation.

In both examples, either a barrier after the assignment to sum or a single construct (without nowait) is needed in order to prevent race conditions.



来源:https://stackoverflow.com/questions/22938901/initialize-variable-for-omp-reduction

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!