Why does the compiler reserve a little stack space but not the whole array size?

前端 未结 2 1969
梦谈多话
梦谈多话 2020-12-04 01:01

The following code

int main() {
  int arr[120];
  return arr[0];
}

Compiles into this:

  sub     rsp, 360
  mov     eax, D         


        
2条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-04 01:59

    You're on x86-64 Linux, where the ABI includes a red-zone (128 bytes below RSP). https://stackoverflow.com/tags/red-zone/info.

    So the array goes from the bottom of the red-zone up to near the top of what gcc reserved. Compile with -mno-red-zone to see different code-gen.

    Also, your compiler is using RSP, not ESP. ESP is the low 32 bits of RSP, and x86-64 normally has RSP outside the low 32 bits so it would crash if you truncated RSP to 32 bits.


    On the Godbolt compiler explorer, I get this from gcc -O3 (with gcc 6.3, 7.3, and 8.1):

    main:
        sub     rsp, 368
        mov     eax, DWORD PTR [rsp-120]   # -128, not -480 which would be outside the red-zone
        add     rsp, 368
        ret
    

    Did you fake your asm output, or does some other version of gcc or some other compiler really load from outside the red-zone on this undefined behaviour (reading an uninitialized array element)? clang just compiles it to ret, and ICC just returns 0 without loading anything. (Isn't undefined behaviour fun?)


    int ext(int*);
    int foo() {
      int arr[120];     // can't use the red-zone because of later non-inline function call
      ext(arr);
      return arr[0];
    }
       # gcc.  clang and ICC are similar.
        sub     rsp, 488
        mov     rdi, rsp
        call    ext
        mov     eax, DWORD PTR [rsp]
        add     rsp, 488
        ret
    

    But we can avoid UB in a leaf function without letting the compiler optimize away the store/reload. (We could maybe just use volatile instead of inline asm).

    int bar() {
      int arr[120];
      asm("nop # operand was %0" :"=m" (arr[0]) );   // tell the compiler we write arr[0]
      return arr[0];
    }
    
    # gcc output
    bar:
        sub     rsp, 368
        nop # operand was DWORD PTR [rsp-120]
        mov     eax, DWORD PTR [rsp-120]
        add     rsp, 368
        ret
    

    Note that the compiler only assumes we wrote arr[0], not any of arr[1..119].

    But anyway, gcc/clang/ICC all put the bottom of the array in the red-zone. See the Godbolt link.

    This is a good thing in general: more of the array is within range of a disp8 from RSP, so reference to arr[0] up to arr[63 or so could use [rsp+disp8] instead of [rsp+disp32] addressing modes. Not super useful for one big array, but as a general algorithm for allocating locals on the stack it makes total sense. (gcc doesn't go all the way to the bottom of the red-zone for arr, but clang does, using sub rsp, 360 instead of 368 so the array is still 16-byte aligned. (IIRC, the x86-64 System V ABI at least recommends this for arrays with automatic storage with size >= 16 bytes.)

提交回复
热议问题