Analyzing CPU registers during kernel crash dump

后端 未结 3 561
你的背包
你的背包 2021-01-07 08:25

I was debugging a issue and hit the below kernel crash along with crash dump being generated. To some extent i do know, how to get to the exact line in the code where the is

相关标签:
3条回答
  • 2021-01-07 09:11
    1. and 2.: It is rather hard to find out how cpu registers relates to parameters and variable values.

    3: That code is assembler code. You may find it in your disassembled program and find out where that problem occured. Notice that there is <48> 8b 01 48 ... - and AFAIK the trap occurs at this assembler command. It means that you need to debug it by disassembling your code. If you compile your program (module) with debuggig symbols you can find out the number line where the problem occured.

    0 讨论(0)
  • 2021-01-07 09:14

    You must understand that you are inspecting (not debugging) at assembly level (not source code). This is important thing that you must hold in your head when inspecting crash dumps.

    You have to read your crash dump report carefully line by line because it contains lots of info and also that's all you got.

    When you got place when your code was crashed - you have to figure out why that happened by reading crash dump report and disassembly.

    First line in your crash dump report tells you

    BUG: unable to handle kernel paging request at ffffc90028213000
    

    That means you are using invalid memory.

    Line

    Process diseproc (pid: 1126, threadinfo ffff880435fc4000, task ffff8807f8be8ae0)
    

    tells you what happened in userspace on crash time. Seems like userspace process diseproc issued some command to your driver that caused crash.

    Very important line is

    IP: [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
    

    Try to issue dis debug_function command to disassemble debug_function, find debug_function+25(0x19 hex = 25 dec) and look around. Read it side by side with C source code for debug_function. Usually you can find crash place in C code by comparing callq instructions - disassembly will show printable name of called functions.

    Next and most important is Call trace:

    Call Trace:
     [<ffffffffa0180498>] cmd_dump+0x1c8/0x360 [dise]
     [<ffffffffa01978e1>] debug_log_show+0x91/0x160 [dise]
     [<ffffffffa013afb9>] process_debug+0x5a9/0x990 [dise]
     [<ffffffff810792c7>] ? current_fs_time+0x27/0x30
     [<ffffffffa013bc38>] dise_ioctl+0xd8/0x300 [dise]
     [<ffffffff8105a501>] ? hotplug_hrtick+0x21/0x60
     [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
     [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
     [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
     [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
     [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
    

    Reading bottom to top: kernel got ioctl (from diseproc, obvious), kernel invoked ioctl handler dise_ioctl in dise module, then current_fs_time, process_debug, debug_log_show and finally cmd_dump.

    Now you know:

    • Code path: dise_ioctl -> current_fs_time -> process_debug -> debug_log_show -> cmd_dump -> somehow to debug_function.
    • Approximate place in C code that caused crash
    • Reason to crash: access to invalid memory

    With this info you have to use your last and most powerful method - thinking. Try to understand what variables/structures caused crash. Maybe some of them was freed by the time you arrived in debug_function? Maybe you mistype in pointer arithmetic?

    Answers to questions:

    1. Most of the times CPU register values are pointless because it has nothing to do with your C code. Just some values, pointing to some memory - whatever. Yes, there are some extremely useful registers like RIP/EIP and RSP/ESP, but most of them is way too out of context.

    2. Very unlikely. You are actually not debugging - you are inspecting your dump - you don't have any debugging context.

    3. I agree with @user2699113 that it just memory content under pointer from RIP.

    And remember - best debugging tool is your brain.

    0 讨论(0)
  • 2021-01-07 09:19

    See here... This has good documentation on how to debug kernel crashes.. See the section Objdump

    What it tells it that you can disassemble your kernel image using objdump on vmlinux image. This command will output a large a text file of your kernel source code ... You can then grep for the problem causing EIP in the previously created output file.

    PS: I would recommend doing objdump on vmlinux and saving it locally.

    0 讨论(0)
提交回复
热议问题