Why is gcc generating an extra return address?

后端 未结 2 513
孤街浪徒
孤街浪徒 2020-12-03 16:55

I am currently learning the basics of assembly and came across something odd when looking at the instructions generated by gcc (6.1.1).

Here is the source:



        
2条回答
  •  感动是毒
    2020-12-03 16:57

    GCC copies the return address in order to create a normal looking stack frame that debuggers can walk through following chained saved frame pointer (EBP) values. Though part of the reason why GCC generates code like this is to handle the worst case of the function also having a variable length stack allocation, like can happen when a variable length array or alloca() is used.

    Normally when code is compiled without optimization (or with the -fno-omit-frame-pointer option) the compiler creates a stack frame that includes a link back to the previous stack frame using the saved frame pointer value of the caller. Normally the compiler saves the previous frame pointer value as the first thing on the stack after the return address and then sets the frame pointer to point to this location on the stack. When all the functions in a program do this then the frame pointer register becomes a pointer to a linked list of stack frames, one that can be traced back all the way to the program's startup code. The return addresses in each frame show which function each frame belongs to.

    However instead of saving the previous frame pointer, the first thing GCC does in a function that needs to align the stack is to preform that alignment, putting an unknown number padding bytes after the return address. So in order to create what looks like a normal stack frame, it copies the return address after those padding bytes and then saves the previous frame pointer. The problem with is that it's not really necessary copy the return address like this, as demonstrated by Clang and shown in Peter Cordes' answer. Like Clang, GCC could instead have immediately saved the previous frame pointer value (EBP) and then aligned the stack.

    Essentially what both compilers do is create a split stack frame, one split in two by the the alignment padding created to align the stack. The top part, above the padding, is where the locale variables are stored. The bottom part, below the padding, is where the incoming arguments can be found. Clang uses ESP to access the top part, and EBP to access the bottom part. GCC uses EBP to access the bottom part, and uses the saved ECX value from the prologue on the stack to access the top part. In both cases EBP points to what looks like a normal stack frame, though only GCC's EBP can be used to access the function's local variable like with a normal frame.

    So in the normal case Clang's strategy is clearly better, there's no need to copy the return address, and there's no need save an extra value (the ECX value) on stack. However in the case where the compiler needs to both align the stack and allocate something with variable size, an extra value does need to be stored somewhere. Since the variable allocation means that the stack pointer no longer has a fixed offset to the local variables, it can't be used access them anymore. There needs to be two separate values stored somewhere, one that points at the top part of the split frame and one that points at the bottom part.

    If you look the code Clang generates when compiling a function that both requires aligning the stack and has a variable length allocation you'll see that it allocates a register that effectively becomes a second frame pointer, one that points to top part of the split frame. GCC doesn't need to this because its already using the EBP to point to the top part. Clang continues to use the EBP to point to the bottom part, while GCC uses the saved ECX value.

    Clang isn't perfect here though, since it also allocates another register to restore the stack to the value it had before the variable length allocation when it goes out of scope. In many cases though this isn't necessary and the register used as the second frame pointer could be used instead to restore the stack.

    GCC's strategy seems to be based on the desire to have a single set of boiler plate prologue and epilogue code sequences that that can be used for all functions that need stack alignment. It also avoids allocating any registers for the lifetime of the function, although the saved ECX value can be used directly from ECX if it hasn't been clobbered yet. I suspect that generating more flexible code like Clang does would difficult given how GCC generates function prologue and epilogue code.

    (However, when generating 64-bit x86 code, GCC 8 and later do use a simpler prologue for functions that need to over-align the stack, if they don't need any variable length stack allocations. It's more like Clang's strategy.)

提交回复
热议问题