openMP nested parallel for loops vs inner parallel for

后端 未结 2 1775
孤城傲影
孤城傲影 2020-12-02 19:08

If I use nested parallel for loops like this:

#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
    #pragma omp parallel for          


        
相关标签:
2条回答
  • 2020-12-02 19:29

    If your compiler supports OpenMP 3.0, you can use the collapse clause:

    #pragma omp parallel for schedule(dynamic,1) collapse(2)
    for (int x = 0; x < x_max; ++x) {
        for (int y = 0; y < y_max; ++y) { 
        //parallelize this code here
        }
    //IMPORTANT: no code in here
    }
    

    If it doesn't (e.g. only OpenMP 2.5 is supported), there is a simple workaround:

    #pragma omp parallel for schedule(dynamic,1)
    for (int xy = 0; xy < x_max*y_max; ++xy) {
        int x = xy / y_max;
        int y = xy % y_max;
        //parallelize this code here
    }
    

    You can enable nested parallelism with omp_set_nested(1); and your nested omp parallel for code will work but that might not be the best idea.

    By the way, why the dynamic scheduling? Is every loop iteration evaluated in non-constant time?

    0 讨论(0)
  • 2020-12-02 19:29

    NO.

    The first #pragma omp parallel will create a team of parallel threads and the second will then try to create for each of the original threads another team, i.e. a team of teams. However, on almost all existing implementations the second team has just only one thread: the second parallel region is essentially not used. Thus, your code is more like equivalent to

    #pragma omp parallel for schedule(dynamic,1)
    for (int x = 0; x < x_max; ++x) {
        // only one x per thread
        for (int y = 0; y < y_max; ++y) { 
            // code here: each thread loops all y
        }
    }
    

    If you don't want that, but only parallelise the inner loop, you can do this:

    #pragma omp parallel
    for (int x = 0; x < x_max; ++x) {
        // each thread loops over all x
    #pragma omp for schedule(dynamic,1)
        for (int y = 0; y < y_max; ++y) { 
            // code here, only one y per thread
        }
    }
    
    0 讨论(0)
提交回复
热议问题