I\'ve set breakpoints on exit and _exit and my program (multithreaded app, running on linux 2.6.16.46-0.12 sles10), is somehow still exiting in a way I can\'t locate
There are two common reasons for _exit
breakpoint to "miss" -- either GDB
didn't set the breakpoint in the right place, or the program performs (a moral equivalent of) syscall(SYS_exit, ...)
What do info break
and disassemble _exit
say?
You might be able to convince GDB
to set the breakpoint correctly with break *&_exit
. Alternatively, GDB-7.0
supports catch syscall
. Something like this should work (assuming Linux/x86_64
; note that on ix86
the numbers will be different) regardless of how the program exits:
(gdb) catch syscall 60
Catchpoint 3 (syscall 'exit' [60])
(gdb) catch syscall 231
Catchpoint 4 (syscall 'exit_group' [231])
(gdb) c
Catchpoint 4 (call to syscall 'exit_group'), 0x00007ffff7912f3d in _exit () from /lib/libc.so.6
Update:
Your comment indicates that _exit breakpoint is set correctly, so it's likely that your process just doesn't execute _exit
.
That leaves syscall(SYS_exit, ...)
and one other possibility (which I missed before): all threads executing pthread_exit
. You might want to set a breakpoint on pthread_exit
as well (and execute info thread
each time you hit it -- the last thread to do pthread_exit
will cause the process to terminate).
Edit:
Also worth noting that you can use mnemonic names, rather than syscall numbers. You can also simultaneously add multiple syscalls to the catch list like so:
(gdb) catch syscall exit exit_group
Catchpoint 2 (syscalls 'exit' [1] 'exit_group' [252])
Setting the breakpoint on _exit was a good idea.
You might also try linking statically, just to take a stack of potential gdb complications off the table.
0177 is suspiciously like the wait status wait(2)
returns for child stopped, but gdb is printing the exit status, which is a different thing, so that's probably a real exit argument.
It might be that you have some lazy references unresolved in some shared library loaded into process. I have exactly the same situation that "someone somewhere" exited process and that appeared to be unresolved reference.
Check your process with "ldd -r" option.
Looks like ld.so or whatever does lazy resolving of some symbols to uniform exit function (which should be abort IMHO).
My situation:
$ ldd ./program
undefined symbol: XXXX (/usr/lib/libYYY.so)
$./program
program: started!
...
<program is running regardless of undefined references>
Now exit appeared when I've invoked some scenario that used function that was undefined. It always exited with exitcode=127 and gdb reported 0177.