Atomic Minimum on x86 using OpenMP

问题

Does OpenMP support an atomic minimum for C++11? If OpenMP has no portable method: Is there some way of doing it using a x86 or amd64 feature?

In the OpenMP specifications I found nothing for C++ but the Fortran version seems to support it. See 2.8.5 of the v3.1 for the details. For C++ it states

binop is one of +, *, -, /, &, ^, |, <<, or >>.

but for Fortran it states

intrinsic_procedure_name is one of MAX, MIN, IAND, IOR, or IEOR.

In case you are interested in more context: I am looking for a mutex free method of doing the following:

vector<omp_lock_t>lock;
vector<int>val;

#pragma omp parallel
{
  // ...
  int x = ...;
  int y = ...;
  if(y < val[x]){
    omp_set_lock(&lock[x]);
    if(y < val[x])
      val[x] = y;
    omp_unset_lock(&lock[x]);
  }
}

I know that you can compute the minimum using a reduce algorithm. I know that there are circumstances where this largely outperforms any atomic minimum approach. However, I also know that this is not the case in my situation.

EDIT: One option that is slightly faster in my case is

  int x = ...;
  int y = ...;
  while(y < val[x])
    val[x] = y;

but that is no atomic operation.

All the newer GPUs have this feature and I am missing it on the CPU. (See atom_min for OpenCL.)

回答1:

The OpenMP specification for C++ does not have support for atomic minimum. Neither does C++11.

I am assuming that in your algorithm, x can compute to any valid index, regardless of thread. I would suggest changing your algorithm, so that each thread uses its own val array and then do a final reconciliation at the end, which can also be parallelized by index. This will avoid locks and atomics completely and give you the benefit of separating the data for each thread, i.e. no chance for false cache sharing. In other words, it should be faster.

来源：https://stackoverflow.com/questions/12255266/atomic-minimum-on-x86-using-openmp

标签

c++

x86

openmp

atomic

minimum