问题
I'm calling (external) subroutine Objee from within another subroutine (FindVee):
subroutine FindVee(EVone,Vw0,Ve,Fye)
use nag_library, only: nag_wp
use My_interface_blocks, only: Objee
...
implicit none
real(kind=nag_wp) :: apmax, Val
...
call Objee(apmax,Val)
write(*,*) 'After Objee', apmax, Val
...
end subroutine FindVee
Subroutine Objee is:
subroutine Objee(ap,V)
use nag_library, only: nag_wp
...
implicit none
real(kind=nag_wp), intent(in) :: ap
real(kind=nag_wp), intent(out):: V
...
V = U(x,sigma) + beta*piy*yhat1(Nav*(Nav+1)/2) + &
& beta*eta*(1.0e0-piy)*yhat2(Nav*(Nav+1)/2)
V = - V
write(*,*) 'Exit Objee', ap, V
end subroutine Objee
Running the code like this, produces the following print on screen:
Exit Objee 0.0000000000000000 9997.5723796583643
Program received signal SIGBUS: Access to an undefined portion of a memory object.
Backtrace for this error:
#0 0x7FAA7ADFF7D7
#1 0x7FAA7ADFFDDE
#2 0x7FAA7A533FEF
#3 0x423B29 in findvee_Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7FAA7ADFF7D7
#1 0x7FAA7ADFFDDE
#2 0x7FAA7A533FEF
#3 0x7FAA7A0B9BA0
#4 0x7FAA7A0BAEFD
#5 0x7FAA7ADFF7D7
#6 0x7FAA7ADFFDDE
#7 0x7FAA7A533FEF
#8 0x423B29 in findvee_ Segmentation fault (core dumped)
I'm using gfortran 4.8.1, using the following options: -fopenmp -fcheck=all -fcheck=bounds -Wall -Wimplicit-interface -Wimplicit-procedure. The compiler doesn't show any warnings.
After a week of trying all sorts of things and scanning half the internet for a clue of what was happening, I thought I'd print the shape of V in Objee and see what fortran gave me- somehow it turns out it solves the problem:
subroutine Objee(ap,V)
...
write(*,*) 'Exit Objee', ap, V, shape(V)
end subroutine Objee
produces the following on screen:
Exit Objee 0.0000000000000000 9997.5723796583643
After Objee 0.0000000000000000 9997.5723796583643
Magic! Everything works and it seems like everything's right. Could somebody explain to me what's going on here? And also, how I can solve whatever was going on without printing shape(V) on screen with every call to Objee (which will be in the thousands...)
After running valgrind ./programa --leak-check=full, I obtain the following output:
==2784== Invalid write of size 8
==2784== at 0x423B3F: findvee_ (FindVee.f95:66)
==2784== by 0x8: ???
==2784== Address 0x7ffffffffffffda8 is not stack'd, malloc'd or (recently) free'd
==2784==
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x4E4D7D7
#1 0x4E4DDDE
#2 0x56A3FEF
#3 0x423B3F in findvee_ at FindVee.f95:66
==2784== Invalid read of size 8
==2784== at 0x5C7FBA0: ??? (in /lib/x86_64-linux-gnu/libgcc_s.so.1)
==2784== by 0x5C80EFD: _Unwind_Backtrace (in /lib/x86_64-linux-gnu/libgcc_s.so.1)
==2784== by 0x4E4D7D7: _gfortran_backtrace (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==2784== by 0x4E4DDDE: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==2784== by 0x56A3FEF: ??? (in /lib/x86_64-linux-gnu/libc-2.17.so)
==2784== by 0x423B3E: findvee_ (FindVee.f95:64)
==2784== by 0x8: ???
==2784== Address 0x8000000000000008 is not stack'd, malloc'd or (recently) free'd
==2784==
==2784==
==2784== Process terminating with default action of signal 11 (SIGSEGV)
==2784== General Protection Fault
==2784== at 0x5C7FBA0: ??? (in /lib/x86_64-linux-gnu/libgcc_s.so.1)
==2784== by 0x5C80EFD: _Unwind_Backtrace (in /lib/x86_64-linux-gnu/libgcc_s.so.1)
==2784== by 0x4E4D7D7: _gfortran_backtrace (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==2784== by 0x4E4DDDE: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==2784== by 0x56A3FEF: ??? (in /lib/x86_64-linux-gnu/libc-2.17.so)
==2784== by 0x423B3E: findvee_ (FindVee.f95:64)
==2784== by 0x8: ???
==2784==
==2784== HEAP SUMMARY:
==2784== in use at exit: 3,859 bytes in 20 blocks
==2784== total heap usage: 157 allocs, 137 frees, 300,126 bytes allocated
==2784==
==2784== LEAK SUMMARY:
==2784== definitely lost: 58 bytes in 1 blocks
==2784== indirectly lost: 0 bytes in 0 blocks
==2784== possibly lost: 0 bytes in 0 blocks
==2784== still reachable: 3,801 bytes in 19 blocks
==2784== suppressed: 0 bytes in 0 blocks
==2784== Rerun with --leak-check=full to see details of leaked memory
==2784==
==2784== For counts of detected and suppressed errors, rerun with: -v
==2784== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 2)
Segmentation fault (core dumped)
Lines 64 and 66 (to which the output points) are:
64 call Objee(apmax,Val)
66 write(*,*) 'After Objee', apmax, Val
As an inexperienced user, I don't really understand how this helps me in any way, other than pointing to the portion of my code I already suspected was causing the crash. What am I missing here?
回答1:
Memory errors like this in Fortran have two common causes. 1) illegal subscript access. 2) Mismatch between actual arguments in a procedure call and the dummy arguments of the subroutine. Modern compilers and Fortran >=90 give the programmer help in finding these problems. As suggested by Peter, are you using the full warning and error options of your compiler, esp. run-time subscript checking? (What compiler are you using?) If you place your procedures in a module and use that module Fortran will check for consistency between the arguments of the call and the subroutine. When a procedure is in a module, its interface is "known" to other procedures or the main program that uses that module, enabling this checking. With the These two methods will find many errors that cause memory problems.
The reason that adding "random" statements such as output can stop memory errors is that an illegal memory access may do damage to the new code that can be tolerated, whereas before it was doing fatal damaged, such as over writing an address with a data value, creating an illegal address. These bugs can be difficult to diagnose because the fatal error seems disconnected from the code mistake. The tools described in the first paragraph can be a big help.
来源:https://stackoverflow.com/questions/24616204/writing-to-screen-solves-segmentation-error