问题
I am writing a Fortran program that needs to have reproducible results (for publication). My understanding of the following code is that it should be reproducible.
program main
implicit none
real(8) :: ybest,xbest,x,y
integer :: i
ybest = huge(0d0)
!$omp parallel do ordered private(x,y) shared(ybest,xbest) schedule(static,1)
do i = 1,10
!$omp ordered
!$omp critical
call random_number(x)
!$omp end critical
!$omp end ordered
! Do a lot of work
call sleep(1)
y = -1d0
!$omp ordered
!$omp critical
if (y<ybest) then
ybest = y
xbest = x
end if
!$omp end critical
!$omp end ordered
end do
!$omp end parallel do
end program
In my case, there is a function in place of "sleep" that takes long time to compute, and I want it done in parallel. According to OpenMP standards, should sleep in this example execute in parallel? I thought it should be (based on this How does the omp ordered clause work?), but with gfortran 5.2.0 (mac) and gfortran 5.1.0 (linux) it is not executing in parallel (at least, there is no speedup from it). The timing results are below.
Also, my guess is the critical statements are not necessary, but I wasn't completely sure.
Thanks.
-Edit-
In response to Vladmir's comments, I added a full working program with timing results.
#!/bin/bash
mpif90 main.f90
time ./a.out
mpif90 main.f90 -fopenmp
time ./a.out
The code runs as
real 0m10.047s
user 0m0.003s
sys 0m0.003s
real 0m10.037s
user 0m0.003s
sys 0m0.004s
BUT, if you comment out the ordered blocks, it runs with the following times:
real 0m10.044s
user 0m0.002s
sys 0m0.003s
real 0m3.021s
user 0m0.002s
sys 0m0.004s
- Edit -
In response to innoSPG, here are the results for a non-trivial function in place of sleep:
real(8) function f(x)
implicit none
real(8), intent(in) :: x
! local
real(8) :: tmp
integer :: i
tmp = 0d0
do i = 1,10000000
tmp = tmp + cos(sin(x))/real(i,8)
end do
f = tmp
end function
real 0m2.229s --- no openmp
real 0m2.251s --- with openmp and ordered
real 0m0.773s --- with openmp but ordered commented out
回答1:
This program is non-conforming to the OpenMP standard. Specifically, the problem is that you have more than one ordered
region and every iteration of your loop will execute both of them. The OpenMP 4.0 standard has this to say (2.12.8, Restrictions, line 16, p 139):
During execution of an iteration of a loop or a loop nest within a loop region, a thread must not execute more than one ordered region that binds to the same loop region.
If you have more than one ordered
region, you must have conditional code paths such that only one of them can be executed for any loop iteration.
It is also worth noting the position of your ordered region seems to have performance implications. Testing with gfortran 5.2, it appears everything after the ordered region is executed in order for each loop iteration, so having the ordered block at the beginning of the loop leads to serial performance while having the ordered block at the end of the loop does not have this implication as the code before the block is parallelized. Testing with ifort 15 is not as dramatic but I would still recommend structuring your code so your ordered block occurs after any code than needs parallelization in a loop iteration rather than before.
来源:https://stackoverflow.com/questions/32076108/two-openmp-ordered-blocks-with-no-resulting-parallelization