Atomic Minimum on x86 using OpenMP

允我心安 提交于 2021-02-07 06:15:49

问题


Does OpenMP support an atomic minimum for C++11? If OpenMP has no portable method: Is there some way of doing it using a x86 or amd64 feature?

In the OpenMP specifications I found nothing for C++ but the Fortran version seems to support it. See 2.8.5 of the v3.1 for the details. For C++ it states

binop is one of +, *, -, /, &, ^, |, <<, or >>.

but for Fortran it states

intrinsic_procedure_name is one of MAX, MIN, IAND, IOR, or IEOR.

In case you are interested in more context: I am looking for a mutex free method of doing the following:

vector<omp_lock_t>lock;
vector<int>val;

#pragma omp parallel
{
  // ...
  int x = ...;
  int y = ...;
  if(y < val[x]){
    omp_set_lock(&lock[x]);
    if(y < val[x])
      val[x] = y;
    omp_unset_lock(&lock[x]);
  }
}

I know that you can compute the minimum using a reduce algorithm. I know that there are circumstances where this largely outperforms any atomic minimum approach. However, I also know that this is not the case in my situation.

EDIT: One option that is slightly faster in my case is

  int x = ...;
  int y = ...;
  while(y < val[x])
    val[x] = y;

but that is no atomic operation.

All the newer GPUs have this feature and I am missing it on the CPU. (See atom_min for OpenCL.)


回答1:


The OpenMP specification for C++ does not have support for atomic minimum. Neither does C++11.

I am assuming that in your algorithm, x can compute to any valid index, regardless of thread. I would suggest changing your algorithm, so that each thread uses its own val array and then do a final reconciliation at the end, which can also be parallelized by index. This will avoid locks and atomics completely and give you the benefit of separating the data for each thread, i.e. no chance for false cache sharing. In other words, it should be faster.



来源:https://stackoverflow.com/questions/12255266/atomic-minimum-on-x86-using-openmp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!