gfortran openmp segmentation fault occurs on basic do loop

问题

I have a program which distributes particles into a cloud-in-cell mesh. Simply loops over the total number of particles (Ntot) and populates a 256^3 mesh (i.e. each particle gets distributed over 8 cells).

% gfortran -fopenmp cic.f90 -o ./cic

Which compiles fine. But when I run it (./cic) I get a segmentation fault. I my looping is a classic omp do problem. The program works when I don't compile it in openmp.

!$omp parallel do
 do i = 1,Ntot
   if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
     dense(int(x1(i)),int(y1(i)),int(z1(i))) = dense(int(x1(i)),int(y1(i)),int(z1(i))) &
     + dx1(i) * dy1(i) * dz1(i) * mpart
   end if

   if (x2(i).le.Ng.and.y1(i).gt.0.and.z1(i).gt.0) then
     dense(int(x2(i)),int(y1(i)),int(z1(i))) = dense(int(x2(i)),int(y1(i)),int(z1(i))) &
     + dx2(i) * dy1(i) * dz1(i) * mpart
   end if

   if (x1(i).gt.0.and.y2(i).le.Ng.and.z1(i).gt.0) then
     dense(int(x1(i)),int(y2(i)),int(z1(i))) = dense(int(x1(i)),int(y2(i)),int(z1(i))) &
     + dx1(i) * dy2(i) * dz1(i) * mpart
   end if

   if (x2(i).le.Ng.and.y2(i).le.Ng.and.z1(i).gt.0) then
     dense(int(x2(i)),int(y2(i)),int(z1(i))) = dense(int(x2(i)),int(y2(i)),int(z1(i))) &
     + dx2(i) * dy2(i) * dz1(i) * mpart
   end if

   if (x1(i).gt.0.and.y1(i).gt.0.and.z2(i).le.Ng) then
     dense(int(x1(i)),int(y1(i)),int(z2(i))) = dense(int(x1(i)),int(y1(i)),int(z2(i))) &
     + dx1(i) * dy1(i) * dz2(i) * mpart
   end if

   if (x2(i).le.Ng.and.y1(i).gt.0.and.z2(i).le.Ng) then
     dense(int(x2(i)),int(y1(i)),int(z2(i))) = dense(int(x2(i)),int(y1(i)),int(z2(i))) &
     + dx2(i) * dy1(i) * dz2(i) * mpart
   end if

   if (x1(i).gt.0.and.y2(i).le.Ng.and.z2(i).le.Ng) then
     dense(int(x1(i)),int(y2(i)),int(z2(i))) = dense(int(x1(i)),int(y2(i)),int(z2(i))) &
     + dx1(i) * dy2(i) * dz2(i) * mpart
   end if

   if (x2(i).le.Ng.and.y2(i).le.Ng.and.z2(i).le.Ng) then
     dense(int(x2(i)),int(y2(i)),int(z2(i))) = dense(int(x2(i)),int(y2(i)),int(z2(i))) &
     +  dx2(i) * dy2(i) * dz2(i) * mpart
   end if
  end do
!$omp end parallel do

There are no dependencies between iterations. Ideas?

回答1:

This problem, as well as the one in your other question, comes from the fact that automatic heap arrays are disabled when OpenMP is enabled. This means that without -fopenmp, big arrays are automatically placed in the static storage (known as the .bss segment) while small arrays are allocated on the stack. When you switch OpenMP support on, no automatic static allocation is used and your dense arrays gets allocated on the stack of the routine. The default stack limits on OS X are very restrictive, hence the segmentation fault.

You have several options here. The first option is to make dense have static allocation by giving it the SAVE attribute. The other option is to explicitly allocate it on the heap by making it ALLOCATABLE and then using the ALLOCATE statement, e.g.:

REAL, DIMENSION(:,:,:), ALLOCATABLE :: dense

ALLOCATE(dense(256,256,256))

! Computations, computations, computations

DEALLOCATE(dense)

Newer Fortran versions support automatic deallocation of arrays without the SAVE attribute when they go out of scope.

Note that your OpenMP directive is just fine and no additional data sharing clauses are necessary. You do not need to declare i in a PRIVATE clause since loop counters have predetermined private data-sharing class. You do not need to put the other variables in SHARED clause as they are implicitly shared. Yet the operations that you do on dense should either be synchronised with ATOMIC UPDATE (or simply ATOMIC on older OpenMP implementations) or you should use REDUCTION(+:dense). Atomic updates are translated to locked additions and should not incur much of a slowdown, compared to the huge slowdown from having conditionals inside the loop:

INTEGER :: xi, yi, zi

!$OMP PARALLEL DO PRIVATE(xi,yi,zi)
...
if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
  xi = int(x1(i))
  yi = int(y1(i))
  zi = int(z1(i))
  !$OMP ATOMIC UPDATE
  dense(xi,yi,zi) = dense(xi,yi,zi) &
                  + dx1(i) * dy1(i) * dz1(i) * mpart
end if
...

Replicate the code with the proper changes for the other cases. If your compiler complains about the UPDATE clause in the ATOMIC construct, simply delete it.

REDUCTION(+:dense) would create one copy of dense in each thread, which would consume a lot of memory and the reduction applied in the end would grow slower and slower with the size of dense. For small arrays it would work better than atomic updates.

回答2:

See https://computing.llnl.gov/tutorials/openMP/#Clauses for a description of how to make variables shared and private.

It looks like all your variables should be shared, except the loop variable i which must be private. This would suggest using the following line:

!$omp parallel do default(shared) private(i)

This should fix your segmentation fault (assuming I got all the variables correct

However, there is the risk that different threads will attempt to overwrite the same parts of dense simultaneously, resulting in incorrect totals. To protect against this case, you will need to wrap each assignment to dense within something like an !$omp atomic or !$omp critical section.

However, you may find that such critical sections will cause threads to spend most of their time waiting, so you may not see any improvement over purely serial code.

In principal you could solve this problem by declaring dense with the reduction keyword, but unfortunately it cannot be used for arrays.

来源：https://stackoverflow.com/questions/13870564/gfortran-openmp-segmentation-fault-occurs-on-basic-do-loop

标签

parallel-processing

fortran

gfortran

openmp