How does OpenMP handle nested loops?

后端 未结 3 1026
温柔的废话
温柔的废话 2020-11-30 03:02

Does the following code just parallelize the first (outer) loops, or it parallelize the entire nested loops?

    #pragma omp parallel for
    for (int i=0;i&         


        
相关标签:
3条回答
  • 2020-11-30 03:29

    You will be able to better understand this with the following example. Let's do this with two threads.

    #pragma omp parallel for num_threads(2)
    for(int i=0; i< 3; i++) {
        for (int j=0; j< 3; j++) {
            printf("i = %d, j= %d, threadId = %d \n", i, j, omp_get_thread_num());
        }
    }
    

    then the result will be,

    i = 0, j= 0, threadId = 0 
    i = 0, j= 1, threadId = 0 
    i = 0, j= 2, threadId = 0 
    i = 1, j= 0, threadId = 0 
    i = 1, j= 1, threadId = 0 
    i = 1, j= 2, threadId = 0 
    i = 2, j= 0, threadId = 1 
    i = 2, j= 1, threadId = 1 
    i = 2, j= 2, threadId = 1
    

    That means, when you add #pragma omp parallel for to the uppermost for loop, the index of that for loop is divided among the threads. As you can see, when index of i is same the thread id is also the same.

    Instead of that, we can parallel the combinations that we have in a nested for loop. In this example we can have following combinations of i and j.

    i = 0, j= 0
    i = 0, j= 1
    i = 0, j= 2
    i = 1, j= 0
    i = 1, j= 1
    i = 1, j= 2
    i = 2, j= 0
    i = 2, j= 1
    i = 2, j= 2
    

    In order to parallelize the code combination wise, we can add the collapse keyword as follows.

    #pragma omp parallel for num_threads(2) collapse(2)
    for(int i=0; i< 3; i++) {
        for (int j=0; j< 3; j++) {
            printf("i = %d, j= %d, threadId = %d \n", i, j, omp_get_thread_num());
        }
    }
    

    then the result will be as follows.

    i = 0, j= 0, threadId = 0 
    i = 0, j= 1, threadId = 0 
    i = 1, j= 2, threadId = 1 
    i = 2, j= 0, threadId = 1 
    i = 2, j= 1, threadId = 1 
    i = 2, j= 2, threadId = 1 
    i = 0, j= 2, threadId = 0 
    i = 1, j= 0, threadId = 0 
    i = 1, j= 1, threadId = 0 
    

    Then you can see that unlike before, for the same index i, there can be different thread ids ( when (i=1 and j=2 threadId=1) also (i=1 and j=0 threadId=0)). That means in this scenario, the combinations of i and j are divided among the threads.

    0 讨论(0)
  • 2020-11-30 03:34

    The lines you have written will parallelize only the outer loop. To parallelize both you need to add a collapse clause:

    #pragma omp parallel for collapse(2)
        for (int i=0;i<N;i++)
        { 
          for (int j=0;j<M;j++)
          {
           //do task(i,j)//
          }
        }
    

    You may want to check OpenMP 3.1 specifications (sec 2.5.1) for more details.

    0 讨论(0)
  • 2020-11-30 03:52

    OpenMP only parallelizes the loop next to the pragma. You can parallelize the inner loop too if you want to but it won't be done automatically.

    0 讨论(0)
提交回复
热议问题