Analyzing CPU registers during kernel crash dump

一世执手 提交于 2019-12-12 07:24:13

问题


I was debugging a issue and hit the below kernel crash along with crash dump being generated. To some extent i do know, how to get to the exact line in the code where the issue occurred using gdb (l *(debug_fucntion+0x19)) command.

<1>BUG: unable to handle kernel paging request at ffffc90028213000
<1>IP: [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4>PGD 103febe067 PUD 103febf067 PMD fd54e1067 PTE 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/ksm/run
<4>CPU 7
<4>Modules linked in: dise(P)(U) ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge autofs4 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun kvm uinput ipmi_devintf power_meter microcode iTCO_wdt iTCO_vendor_support dcdbas sg ses enclosure serio_raw lpc_ich mfd_core i7core_edac edac_core bnx2 ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dise]
<4>
<4>Pid: 1126, comm: diseproc Tainted: P        W  ---------------    2.6.32-431.el6.x86_64 #1 Dell Inc. PowerEdge R710/0MD99X
<4>RIP: 0010:[<ffffffffa0180279>]  [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4>RSP: 0018:ffff880435fc5b88  EFLAGS: 00010282
<4>RAX: 0000000000000000 RBX: 0000000000010000 RCX: ffffc90028213000
<4>RDX: 0000000000010040 RSI: 0000000000010000 RDI: ffff880fe36a0000
<4>RBP: ffff880435fc5b88 R08: ffffffffa025d8a3 R09: 0000000000000000
<4>R10: 0000000000000004 R11: 0000000000000004 R12: 0000000000010040
<4>R13: 000000000000b101 R14: ffffc90028213010 R15: ffff880fe36a0000
<4>FS:  00007fbe6040b700(0000) GS:ffff8800618e0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: ffffc90028213000 CR3: 0000000fc965b000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process diseproc (pid: 1126, threadinfo ffff880435fc4000, task ffff8807f8be8ae0)
<4>Stack:
<4> ffff880435fc5be8 ffffffffa0180498 0000000081158f46 00000c200000fd26
<4><d> ffffc90028162000 0000fec635fc5bc8 0000000000000018 ffff881011d80000
<4><d> ffffc90028162000 ffff8802f18fe440 ffff880fc80b4000 ffff880435fc5cec
<4>Call Trace:
<4> [<ffffffffa0180498>] cmd_dump+0x1c8/0x360 [dise]
<4> [<ffffffffa01978e1>] debug_log_show+0x91/0x160 [dise]
<4> [<ffffffffa013afb9>] process_debug+0x5a9/0x990 [dise]
<4> [<ffffffff810792c7>] ? current_fs_time+0x27/0x30
<4> [<ffffffffa013bc38>] dise_ioctl+0xd8/0x300 [dise]
<4> [<ffffffff8105a501>] ? hotplug_hrtick+0x21/0x60
<4> [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
<4> [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
<4> [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
<4> [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: be c4 10 e1 48 8b 5d d8 44 01 f0 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 44 00 00 55 48 89 e5 0f 1f 44 00 00 <48> 8b 01 48 c1 e8 3c 83 f8 08 76 0b e8 f6 fb ff ff c9 c3 0f 1f
<1>RIP  [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4> RSP <ffff880435fc5b88>
<4>CR2: ffffc90028213000

Question i have is

  1. Can the CPU register contents which are printed give more information? How do i decode them?

  2. Can i get to know variables values or data structure values from the crash dump which leads to the crash?

  3. What does the "Code : be c4 10 e1 48 8b 5d ... " tell me here?


回答1:


You must understand that you are inspecting (not debugging) at assembly level (not source code). This is important thing that you must hold in your head when inspecting crash dumps.

You have to read your crash dump report carefully line by line because it contains lots of info and also that's all you got.

When you got place when your code was crashed - you have to figure out why that happened by reading crash dump report and disassembly.

First line in your crash dump report tells you

BUG: unable to handle kernel paging request at ffffc90028213000

That means you are using invalid memory.

Line

Process diseproc (pid: 1126, threadinfo ffff880435fc4000, task ffff8807f8be8ae0)

tells you what happened in userspace on crash time. Seems like userspace process diseproc issued some command to your driver that caused crash.

Very important line is

IP: [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]

Try to issue dis debug_function command to disassemble debug_function, find debug_function+25(0x19 hex = 25 dec) and look around. Read it side by side with C source code for debug_function. Usually you can find crash place in C code by comparing callq instructions - disassembly will show printable name of called functions.

Next and most important is Call trace:

Call Trace:
 [<ffffffffa0180498>] cmd_dump+0x1c8/0x360 [dise]
 [<ffffffffa01978e1>] debug_log_show+0x91/0x160 [dise]
 [<ffffffffa013afb9>] process_debug+0x5a9/0x990 [dise]
 [<ffffffff810792c7>] ? current_fs_time+0x27/0x30
 [<ffffffffa013bc38>] dise_ioctl+0xd8/0x300 [dise]
 [<ffffffff8105a501>] ? hotplug_hrtick+0x21/0x60
 [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
 [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
 [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Reading bottom to top: kernel got ioctl (from diseproc, obvious), kernel invoked ioctl handler dise_ioctl in dise module, then current_fs_time, process_debug, debug_log_show and finally cmd_dump.

Now you know:

  • Code path: dise_ioctl -> current_fs_time -> process_debug -> debug_log_show -> cmd_dump -> somehow to debug_function.
  • Approximate place in C code that caused crash
  • Reason to crash: access to invalid memory

With this info you have to use your last and most powerful method - thinking. Try to understand what variables/structures caused crash. Maybe some of them was freed by the time you arrived in debug_function? Maybe you mistype in pointer arithmetic?

Answers to questions:

  1. Most of the times CPU register values are pointless because it has nothing to do with your C code. Just some values, pointing to some memory - whatever. Yes, there are some extremely useful registers like RIP/EIP and RSP/ESP, but most of them is way too out of context.

  2. Very unlikely. You are actually not debugging - you are inspecting your dump - you don't have any debugging context.

  3. I agree with @user2699113 that it just memory content under pointer from RIP.

And remember - best debugging tool is your brain.




回答2:


See here... This has good documentation on how to debug kernel crashes.. See the section Objdump

What it tells it that you can disassemble your kernel image using objdump on vmlinux image. This command will output a large a text file of your kernel source code ... You can then grep for the problem causing EIP in the previously created output file.

PS: I would recommend doing objdump on vmlinux and saving it locally.




回答3:


  1. and 2.: It is rather hard to find out how cpu registers relates to parameters and variable values.

3: That code is assembler code. You may find it in your disassembled program and find out where that problem occured. Notice that there is <48> 8b 01 48 ... - and AFAIK the trap occurs at this assembler command. It means that you need to debug it by disassembling your code. If you compile your program (module) with debuggig symbols you can find out the number line where the problem occured.



来源:https://stackoverflow.com/questions/21850618/analyzing-cpu-registers-during-kernel-crash-dump

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!