How to get a call stack backtrace? (deeply embedded, no library support)

I want my exception handlers and debug functions to be able to print call stack backtraces, basically just like the backtrace() library function in glibc. Unfortunately, my C library (Newlib) doesn't provide such a call.

I've got something like this:

#include <unwind.h&gt // GCC's internal unwinder, part of libgcc
_Unwind_Reason_Code trace_fcn(_Unwind_Context *ctx, void *d)
{
    int *depth = (int*)d;
    printf("\t#%d: program counter at %08x\n", *depth, _Unwind_GetIP(ctx));
    (*depth)++;
    return _URC_NO_REASON;
}

void print_backtrace_here()
{
    int depth = 0;
    _Unwind_Backtrace(&trace_fcn, &depth);
}

which basically works but the resulting traces aren't always complete. For example, if I do

int func3() { print_backtrace_here(); return 0; }
int func2() { return func3(); }
int func1() { return func2(); }
int main()  { return func1(); }

the backtrace only shows func3() and main(). (This is obv. a toy example, but I have checked the disassembly and confirmed that these functions are all here in full and not optimized out or inlined.)

Update: I tried this backtrace code on the old ARM7 system but with the same (or at least, as equivalent as possible) compiler options and linker script and it prints a correct, full backtrace (i.e. func1 and func2 aren't missing) and indeed it even backtraces up past main into the boot initialization code. So presumably the problem isn't with the linker script or compiler options. (Also, confirmed from disassembly that no frame pointer is used in this ARM7 test either).

The code is compiled with -fomit-frame-pointer, but my platform (bare metal ARM Cortex M3) defines an ABI that does not use a frame pointer anyway. (A previous version of this system used the old APCS ABI on ARM7 with forced stack frames and frame pointer, and an backtrace like the one here, which worked perfectly).

The whole system is compiled with -fexception, which ensures the necessary metadata that _Unwind uses is included in the ELF file. (_Unwind is designed for exception handling I think).

So, my question is: Is there a "standard", accepted way of getting reliable backtraces in embedded systems using GCC?

I don't mind having to mess around with the linker scripts and crt0 code if necessary, but don't want to have to make any chances to the toolchain itself.

Thanks!

King Sumo

For this you need -funwind-tables or -fasynchronous-unwind-tables In some targets this is required in order for _Unwind_Backtrace work properly!

gcc does return optimization. In func1() and func2() it does not call func2()/func3() - instead of this, it jumps to func2()/func3(), so func3() can return immediately to main().

In your case, func1() and func2() do not need to setup a stack frame, but if they would do (e.g. for local variables), gcc still can do the optimization if the function call is the last instruction - it then cleans up the stack before the jump to func3().

Have a look at the generated assembler code to see it.

Edit/Update:

To verify that this is the reason, do something after the function call, that cannot be reordered by the compiler (e.g. using a return value). Or just try compiling with -O0.

Since ARM platforms do not use a frame pointer, you never quite know how big the stackframe is and cannot simply roll out the stack beyond the single return value in R14.

When investigating a crash for which we do not have debug symbols, we simply dump the whole stack and lookup the closest symbol to each item in the instruction range. It does generate a load of false positives but can still be very useful for investigating crashes.

If you are running pure ELF executables, you can separate debug symbols out of your release executable. gdb can then help you find out what is going on from your standard unix core dump

Some compilers, like GCC optimize function calls like you mentioned in the example. For the operation of the code fragment, it is not needed to store the intermediate return pointers in the call chain. It's perfectly OK to return from func3() to main(), as the intermediate functions don't do anything extra besides calling another function.

It's not the same as code elimination (actually the intermediate functions could be completely optimized out), and a separate compiler parameter might control this kind of optimisation.

If you use GCC, try -fno-optimize-sibling-calls

Another handy GCC option is -mno-sched-prolog, which prevents instruction reordering in the function prologue, which is vital, if you want to parse the code byte-by-byte, like it is done here: http://www.kegel.com/stackcheck/checkstack-pl.txt

This is hacky, but I've found it works good enough considering the amount of code/RAM space required:

Assuming you're using ARM THUMB mode, compile with the following options:

-mtpcs-frame -mtpcs-leaf-frame  -fno-omit-frame-pointer

The following function is used to retrieve the callstack. Refer to the comments for more info:

/*
 * This should be compiled with:
 *  -mtpcs-frame -mtpcs-leaf-frame  -fno-omit-frame-pointer
 *
 *  With these options, the Stack pointer is automatically pushed to the stack
 *  at the beginning of each function.
 *
 *  This function basically iterates through the current stack finding the following combination of values:
 *  - <Frame Address>
 *  - <Link Address>
 *
 *  This combination will occur for each function in the call stack
 */
static void backtrace(uint32_t *caller_list, const uint32_t *caller_list_end, const uint32_t *stack_pointer)
{
    uint32_t previous_frame_address = (uint32_t)stack_pointer;
    uint32_t stack_entry_counter = 0;

    // be sure to clear the caller_list buffer
    memset(caller_list, 0, caller_list_end-caller_list);

    // loop until the buffer is full
    while(caller_list < caller_list_end)
    {
        // Attempt to obtain next stack pointer
        // The link address should come immediately after
        const uint32_t possible_frame_address = *stack_pointer;
        const uint32_t possible_link_address = *(stack_pointer+1);

        // Have we searched past the allowable size of a given stack?
        if(stack_entry_counter > PLATFORM_MAX_STACK_SIZE/4)
        {
            // yes, so just quite
            break;
        }
        // Next check that the frame addresss (i.e. stack pointer for the function)
        // and Link address are within an acceptable range
        else if((possible_frame_address > previous_frame_address) &&
                ((possible_frame_address < previous_frame_address + PLATFORM_MAX_STACK_SIZE)) &&
               ((possible_link_address  & 0x01) != 0) && // in THUMB mode the address will be odd
                (possible_link_address > PLATFORM_CODE_SPACE_START_ADDRESS &&
                 possible_link_address < PLATFORM_CODE_SPACE_END_ADDRESS))
        {
            // We found two acceptable values

            // Store the link address
            *caller_list++ = possible_link_address;

            // Update the book-keeping registers for the next search
            previous_frame_address = possible_frame_address;
            stack_pointer = (uint32_t*)(possible_frame_address + 4);
            stack_entry_counter = 0;
        }
        else
        {
            // Keep iterating through the stack until be find an acceptable combination
            ++stack_pointer;
            ++stack_entry_counter;
        }
    }

}

You'll need to update #defines for your platform.

Then call the following to populate a buffer with the current call stack:

uint32_t callers[8];
uint32_t sp_reg;
__ASM volatile ("mov %0, sp" : "=r" (sp_reg) );
backtrace(callers, &callers[8], (uint32_t*)sp_reg);

Again, this is rather hacky, but I've found it to work quite well. The buffer will be populated with link addresses of each function call in the call stack.

Does your executable contain debugging information, from compiling with the -g option? I think this is required to get a full stack trace without a frame pointer.

You might need -gdwarf-2 to make sure it uses a format that includes unwind information.

来源：https://stackoverflow.com/questions/3398664/how-to-get-a-call-stack-backtrace-deeply-embedded-no-library-support

标签

c++

gcc

arm

newlib