问题
Customer reported an error in one of our programs caused by division by zero. We have only this VLM line:
kernel: myprog[16122] trap divide error rip:79dd99 rsp:2b6d2ea40450 error:0
I do not believe there is core file for that.
I searched through the Internet to find how I can tell the line of the program that caused this division by zero, but so far I am failing.
I understand that 16122 is pid of the program, so that will not help me.
I suspect that rsp:2b6d2ea40450 has something to do with the address of the line that caused the error (0x2b6d2ea40450) but is that true?
If it is then how can I translate it to a physical approximate location in the source assuming I can load debug version of myprog into gdb, and then request to show the context around this address...
Any, any help will be greatly appreciated!
回答1:
rip is the instruction pointer, rsp is the stack pointer. The stack pointer is not too useful unless you have a core image or a running process.
You can use either addr2line or the disassemble command in gdb to see the line that got the error, based on the ip.
$ cat divtest.c
main()
{
int a, b;
a = 1; b = a/0;
}
$ ./divtest
Floating point exception (core dumped)
$ dmesg|tail -1
[ 6827.463256] traps: divtest[3255] trap divide error ip:400504 sp:7fff54e81330
error:0 in divtest[400000+1000]
$ addr2line -e divtest 400504
./divtest.c:5
$ gdb divtest
(gdb) disass /m 0x400504
Dump of assembler code for function main:
2 {
0x00000000004004f0 : push %rbp
0x00000000004004f1 : mov %rsp,%rbp
3 int a, b;
4
5 a = 1; b = a/0;
0x00000000004004f4 : movl $0x1,-0x4(%rbp)
0x00000000004004fb : mov -0x4(%rbp),%eax
0x00000000004004fe : mov $0x0,%ecx
0x0000000000400503 : cltd
0x0000000000400504 : idiv %ecx
0x0000000000400506 : mov %eax,-0x8(%rbp)
6 }
0x0000000000400509 : pop %rbp
0x000000000040050a : retq
End of assembler dump.
来源:https://stackoverflow.com/questions/25450311/how-to-translate-kernels-trap-divide-error-rsp2b6d2ea40450-to-a-source-locatio