For the following C code:
struct _AStruct {
int a;
int b;
float c;
float d;
int e;
};
typedef struct _AStruct AStruct;
AStruct test_cal
Why does the caller in Linux32 do these extra subs?
The reason is the use of a hidden pointer (named return value optimization), injected by the compiler, for returning the struct by value. In SystemV's ABI, page 41, in the section about "Function Returning Structures or Unions", it says:
The called function must remove this address from the stack before returning.
That is why you get a ret $0x4
at the end of test_callee5()
, it is for compliance with the ABI.
Now about the presence of sub $0x4, %esp
just after each test_callee5()
call sites, it is a side-effect of the above rule, combined with optimized code generated by the C compiler. As the local storage stack space is pre-reserved entirely by:
3: sub $0x38,%esp
there is no need to push/pop the hidden pointer, it is just written at bottom of the pre-reserved space (pointed at by esp
), using mov %eax,(%esp)
at lines 9 and 17. As the stack pointer is not decremented, the sub $0x4,%esp
is there to negate the effect of ret $0x4
, and keep the stack pointer unchanged.
On Win32 (using MSVC compiler I guess), there is no such ABI rule, a simple ret
is used (as expected in cdecl), the hidden pointer is pushed on the stack at line 7 and 11. Though, those slots are not freed after the calls, as an optimization, but only before callee exits, using an add esp,1Ch
, freeing the hidden pointer stack slots (2 * 0x4 bytes) and the local AStruct
struct (0x14 bytes).
Doesn't cdecl define the calling convention for a function returning a structure?!
Unfortunately, it does not, it varies with C compilers and operating systems
There is no single "cdecl" calling convention. It is defined by the compiler and operating system.
Also reading the assembly I am not actually sure the convention is actually different—in both cases the caller is providing buffer for the output as extra argument. It's just that gcc chose different instructions (the second extra sub is strange; is that code optimized?).