GCC placing register args on the stack with a gap below local variables?

烂漫一生 提交于 2021-02-19 06:22:28

问题


I tried to look at the assembly code for a very simple program.

int func(int x) {
    int z = 1337;
    return z;
} 

With GCC -O0, every C variable has a memory address that's not optimized away, so gcc spills its register arg: (Godbolt, gcc5.5 -O0 -fverbose-asm)

func:
        pushq   %rbp  #
        movq    %rsp, %rbp      #,
        movl    %edi, -20(%rbp) # x, x
        movl    $1337, -4(%rbp) #, z
        movl    -4(%rbp), %eax  # z, D.2332
        popq    %rbp    #
        ret

What is the reason that the function parameter x gets placed on the stack below the local variables? Why not place it at at -4(%rbp) and the local below that?

And when placing it below the local variables, why not place it at -8(%rbp)?

Why leave a gap, using more of the red-zone than necessary? Couldn't this touch a new cache line that wouldn't otherwise have been touched in this leaf function?


回答1:


(First of all, don't expect efficient decisions at -O0. It turns out that the things you noticed at -O0 still happen at -O3 if we use volatile or other things to force the compiler to allocate stack space otherwise this question would be a lot less interesting.)

What is the reason that the function parameter x gets placed on the stack below the local variables?

The choice is 100% arbitrary, and depends on compiler internals. GCC and clang both happen to make that choice, but it's basically irrelevant. The args arrive in registers and basically are just locals so it's totally up to the compiler to decide where to spill them (or not spill at all, if you enable optimization).

  • Order of local variable allocation on the stack links ftp://gcc.gnu.org/pub/gcc/summit/2003/Optimal%20Stack%20Slot%20Assignment.pdf

But why save it further down the stack later than really necessary?

Because of known(?) GCC missed-optimization bugs leading to wasting stack space. For example, Why does GCC allocate more space than necessary on the stack? demonstrates x86-64 GCC -O3 allocating 24 instead of 8 bytes of stack space, where clang allocates 8. (I think I've seen a bug report about sometimes using an extra 16 bytes of space when GCC needs to move RSP (unlike here where it's just using the red zone) but can't find it on the GCC bugzilla.)

Note that the x86-64 System V ABI mandates 16-byte stack alignment before call. After push %rbp and setting up RBP as a frame pointer, RBP and RSP are 16-byte aligned. -20(%rbp) is in the same aligned 16-byte chunk of stack space as -8(%rbp) so this gap isn't risking touching a new cache line or page that we wouldn't already have touched. (A naturally-aligned chunk of memory can't cross any boundary wider than itself, and x86-64 cache lines are always at least 32 bytes; these days always 64 bytes.)

However, this does become a missed optimization if we add a 2nd arg, int y: gcc5.5 (and current gcc9.2 -O0) spills it to -24(%rbp) which could be in a new cache line.


It turns out this missed optimization is not just because you used -O0 (compile fast, skip most optimization passes, make bad asm). Finding missed optimizations in -O0 output is meaningless unless they're still present at an optimization level anyone cares about, specifically -Os, -O2 or -O3.

We can prove it with code that uses volatile to still make gcc allocate stack space for args/locals at -O3 Another option would have been to pass their address to another function, but then GCC would have to reserve space instead of just using the red-zone below RSP.

int *volatile sink;

int func(int x, int y) {
    sink = &x;
    sink = &y;
    int z = 1337;
    sink = &z;
    return z;
}

(Godbolt, gcc9.2)

gcc9.2 -O3  (hand-edited comments)
func(int, int):
        leaq    -20(%rsp), %rax                 # &x
        movq    %rax, sink(%rip)        # tmp84, sink
        leaq    -24(%rsp), %rax                 # &y
        movq    %rax, sink(%rip)        # tmp86, sink
        leaq    -4(%rsp), %rax                  # &z
        movq    %rax, sink(%rip)        # tmp88, sink
        movl    $1337, %eax     #,
        ret     
sink:
        .zero   8

Fun fact: clang -O3 spills the stack args before storing their address to sink, like it was a std::atomic release-store of the address and another thread could maybe load their value after getting the pointer from sink. But it doesn't do that for z. It's just a missed optimization to actually spill x and y and I can only speculate on what part of clang's internal machinery might be to blame.

Anyway, clang does allocate z at -4(%rsp), x at -8, y at -12. So for whatever reason, clang also chooses to put the spill slots for the args below the locals.


Related:

  • Waste in memory allocation for local variables discusses GCC's main not assuming 16-byte alignment on entry to main.

  • several possible duplicates about GCC allocating extra stack space for variables, but mostly just as required by alignment, not extra.



来源:https://stackoverflow.com/questions/58631698/gcc-placing-register-args-on-the-stack-with-a-gap-below-local-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!