Two openmp ordered blocks with no resulting parallelization

依然范特西╮ 提交于 2019-12-13 22:18:43

问题


I am writing a Fortran program that needs to have reproducible results (for publication). My understanding of the following code is that it should be reproducible.

program main
implicit none
real(8) :: ybest,xbest,x,y
integer :: i

ybest = huge(0d0)
!$omp parallel do ordered private(x,y) shared(ybest,xbest) schedule(static,1)
do i = 1,10
    !$omp ordered
    !$omp critical
    call random_number(x)
    !$omp end critical
    !$omp end ordered

    ! Do a lot of work
    call sleep(1)
    y = -1d0

    !$omp ordered
    !$omp critical
    if (y<ybest) then
    ybest = y
    xbest = x
    end if
    !$omp end critical
    !$omp end ordered
end do
!$omp end parallel do

end program

In my case, there is a function in place of "sleep" that takes long time to compute, and I want it done in parallel. According to OpenMP standards, should sleep in this example execute in parallel? I thought it should be (based on this How does the omp ordered clause work?), but with gfortran 5.2.0 (mac) and gfortran 5.1.0 (linux) it is not executing in parallel (at least, there is no speedup from it). The timing results are below.

Also, my guess is the critical statements are not necessary, but I wasn't completely sure.

Thanks.

-Edit-

In response to Vladmir's comments, I added a full working program with timing results.

#!/bin/bash
mpif90 main.f90
time ./a.out
mpif90 main.f90 -fopenmp
time ./a.out

The code runs as

real    0m10.047s
user    0m0.003s
sys 0m0.003s

real    0m10.037s
user    0m0.003s
sys 0m0.004s

BUT, if you comment out the ordered blocks, it runs with the following times:

real    0m10.044s
user    0m0.002s
sys 0m0.003s

real    0m3.021s
user    0m0.002s
sys 0m0.004s
  • Edit -

In response to innoSPG, here are the results for a non-trivial function in place of sleep:

real(8) function f(x)
    implicit none
    real(8), intent(in) :: x
    ! local
    real(8) :: tmp
    integer :: i
    tmp = 0d0
    do i = 1,10000000
        tmp = tmp + cos(sin(x))/real(i,8)
    end do
    f = tmp
end function


real    0m2.229s --- no openmp
real    0m2.251s --- with openmp and ordered
real    0m0.773s --- with openmp but ordered commented out

回答1:


This program is non-conforming to the OpenMP standard. Specifically, the problem is that you have more than one ordered region and every iteration of your loop will execute both of them. The OpenMP 4.0 standard has this to say (2.12.8, Restrictions, line 16, p 139):

During execution of an iteration of a loop or a loop nest within a loop region, a thread must not execute more than one ordered region that binds to the same loop region.

If you have more than one ordered region, you must have conditional code paths such that only one of them can be executed for any loop iteration.


It is also worth noting the position of your ordered region seems to have performance implications. Testing with gfortran 5.2, it appears everything after the ordered region is executed in order for each loop iteration, so having the ordered block at the beginning of the loop leads to serial performance while having the ordered block at the end of the loop does not have this implication as the code before the block is parallelized. Testing with ifort 15 is not as dramatic but I would still recommend structuring your code so your ordered block occurs after any code than needs parallelization in a loop iteration rather than before.



来源:https://stackoverflow.com/questions/32076108/two-openmp-ordered-blocks-with-no-resulting-parallelization

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!