OpenMP and nested parallelism

旧巷老猫 提交于 2019-12-11 01:05:27

问题


I would like to "nest" parallel for using OpenMP. Here is a toy code:

#include <iostream>
#include <cmath>

void subproblem(int m) {
  #pragma omp parallel for
  for (int j{0}; j < m; ++j) {
    double sum{0.0};
    for (int k{0}; k < 10000000; ++k) {
      sum += std::cos(static_cast<double>(k));
    }
    #pragma omp critical
    { std::cout << "Sum: " << sum << std::endl; }
  }
}

int main(int argc, const char *argv[]) {
  int n{2};
  int m{8};

  #pragma omp parallel for
  for (int i{0}; i < n; ++i) {
    subproblem(m);
  }

  return 0;
}

Here is what I want:

  • If n >= (number of cores on my machine), I want only the first loop to be parallelized.
  • If n < (number of cores on my machine), I want OpenMP to launch thread in the inner loop, but I don't want the total number of threads to exceed the number of cores on my machine.

So far, I have only found a solution that disables nested parallelism or always allow it, but I am looking at a way to enable it only if the number of threads launched is below the number of cores.

Is there an OpenMP solution for that using tasks?


回答1:


Rather than using a pair of nested parallel sections, you can tell OpenMP to "collapse" the nested loops into a single parallel section over the n*m iteration space:

#pragma omp parallel for collapse(2)
for (int i{0}; i < n; ++i) {
  for (int j{0}; j < m; ++j) {
    // ...
  }
}

This will allow it to divide the work appropriately regardless of the relative values of n and m.




回答2:


OMP_NUM_THREADS - Specifies the default number of threads to use in parallel regions. The value of this variable shall be a comma-separated list of positive integers; the value specified the number of threads to use for the corresponding nested level. If undefined one thread per CPU is used. (from here)

omp_get_max_threads - maximum number of threads that are available to do work (from here)

omp_get_num_threads - number of threads in the current team (from here)

But AFAIK there is no function to get number of all running threads ( it's what you request:

I don't want the total number of threads to exceed the number of cores on my machine

)

Also look at this question




回答3:


Doesn't the if clause of the parallel construct just do it all for you? Here is what the 4.0 OpenMP standard says on page 44:

The syntax of the parallel construct is as follows:

#pragma omp parallel [clause[ [, ]clause] ...] new-line structured-block

where clause is one of the following:
  if(scalar-expression)
  num_threads(integer-expression)
  default(shared | none)
  private(list)
  firstprivate(list)
  shared(list)
  copyin(list)
  reduction(redution-identifier:list)
  proc_bind(master | close | spread)

I didn't try, but I guess that using the if clause just the way you described your two bullet points for whether n is greater than the number of cores on your machine might just work... Would you care to give it a try and let us know?



来源:https://stackoverflow.com/questions/32476024/openmp-and-nested-parallelism

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!