Debugging SIGBUS on x86 Linux

前端 未结 9 1322
生来不讨喜
生来不讨喜 2020-11-30 08:38

What can cause SIGBUS (bus error) on a generic x86 userland application in Linux? All of the discussion I\'ve been able to find online is regarding memory alignment errors,

相关标签:
9条回答
  • 2020-11-30 09:19

    Oh yes there's one more weird way to get SIGBUS.

    If the kernel fails to page in a code page due to memory pressure (OOM killer must be disabled) or failed IO request, SIGBUS.

    0 讨论(0)
  • 2020-11-30 09:19

    This was briefly mentioned above as a "failed IO request", but I'll expand upon it a bit.

    A frequent case is when you lazily grow a file using ftruncate, map it into memory, start writing data and then run out of space in your filesystem. Physical space for mapped file is allocated on page faults, if there's none left then process receives a SIGBUS.

    If you need your application to correctly recover from this error, it makes sense to explicitly reserve space prior to mmap using fallocate. Handling ENOSPC in errno after fallocate call is much simpler than dealing with signals, especially in a multi-threaded application.

    0 讨论(0)
  • 2020-11-30 09:25

    You may see SIGBUS when you're running the binary off NFS (network file system) and the file is changed. See https://rachelbythebay.com/w/2018/03/15/core/.

    0 讨论(0)
  • 2020-11-30 09:26

    SIGBUS on x86 (including x86_64) Linux is a rare beast. It may appear from attempt to access past the end of mmaped file, or some other situations described by POSIX.

    But from hardware faults it's not easy to get SIGBUS. Namely, unaligned access from any instruction — be it SIMD or not — usually results in SIGSEGV. Stack overflows result in SIGSEGV. Even accesses to addresses not in canonical form result in SIGSEGV. All this due to #GP being raised, which almost always maps to SIGSEGV.

    Now, here're some ways to get SIGBUS due to a CPU exception:

    1. Enable AC bit in EFLAGS, then do unaligned access by any memory read or write instruction. See this discussion for details.

    2. Do canonical violation via a stack pointer register (rsp or rbp), generating #SS. Here's an example for GCC (compile with gcc test.c -o test -masm=intel):

    int main()
    {
        __asm__("mov rbp,0x400000000000000\n"
                "mov rax,[rbp]\n"
                "ud2\n");
    }
    
    0 讨论(0)
  • 2020-11-30 09:32

    You can get a SIGBUS from an unaligned access if you turn on the unaligned access trap, but normally that's off on an x86. You can also get it from accessing a memory mapped device if there's an error of some kind.

    Your best bet is using a debugger to identify the faulting instruction (SIGBUS is synchronous), and trying to see what it was trying to do.

    0 讨论(0)
  • 2020-11-30 09:37

    SIGBUS can happen in Linux for quite a few reasons other than memory alignment faults - for example, if you attempt to access an mmap region beyond the end of the mapped file.

    Are you using anything like mmap, shared memory regions, or similar?

    0 讨论(0)
提交回复
热议问题