x86 explanation, number of function arguments and local variables

*爱你&永不变心* 提交于 2021-01-28 04:15:11

问题


The C ABI for the x86-64 system is as follows: Registers rdi, rsi, rdx, rcx, r8, r9 are used to pass arguments in that order. The stack is used for the 7th argument onward. The return value uses the rax register. The rsp register contains the stack pointer.

How many function arguments are defined in the blow function bloop?

I think there is only one function argument, rdi. is this correct?

How many local variables (not arguments) are declared in the below function bloop?

I think there is no local variable. Is this correct?

0000000000001139 <bloop>:
    1139:       55                      push   %rbp
    113a:       48 89 e5                mov    %rsp,%rbp
    113d:       48 83 ec 10             sub    $0x10,%rsp
    1141:       48 89 7d f8             mov    %rdi,-0x8(%rbp)
    1145:       48 83 7d f8 29          cmpq   $0x29,-0x8(%rbp)
    114a:       7f 1b                   jg     1167 <bloop+0x2e>
    114c:       48 8b 05 dd 2e 00 00    mov    0x2edd(%rip),%rax
    1153:       48 89 c6                mov    %rax,%rsi
    1156:       48 8d 3d b5 0e 00 00    lea    0xeb5(%rip),%rdi
    115d:       b8 00 00 00 00          mov    $0x0,%eax
    1162:       e8 c9 fe ff ff          callq  1030 <printf@plt>
    1167:       90                      nop
    1168:       c9                      leaveq
    1169:       c3                      retq

回答1:


Since this asm is obviously compiler output from anti-optimized debug mode (the default -O0 optimization level), you can assume that all register args get spilled to the stack on function entry. (Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?)

So yes, that trivializes reverse engineering and rules out there being any unused function args or args that are passed on to printf in the same register they arrived in.

The stray nop, and use of the leave instruction means this is probably GCC output, as opposed to clang or ICC. Only really relevant for ruling out the possibility of const int foo = 0x29; or something, which GCC wouldn't optimize away at -O0. ICC and clang produce different asm for source that gets GCC to make this asm. I didn't check every compiler version, just recent versions of these compilers.

(Also, this looks like disassembly of a PIE executable or shared library. The address column on the left would have higher addresses in a traditional position-dependent ELF executable, and a compiler would have used mov $imm32, %edi to put a static address in a register.)


So yes, there's one 64-bit integer/pointer arg (which of course arrives in RDI), and the call to printf passes the value of a global or static 64-bit variable loaded with mov 0x2edd(%rip), %rsi, and the address of a global/static format string put into a register with LEA.

And yes, no locals that I can see unless they're totally unused. At -O0, gcc will optimize away int unused; but not int foo = 123;. Having any locals at all, even register const compare = 0x29; will get GCC to subq $24, %rsp instead of 16 (0x10). (See the Godbolt link below.) And it won't actually do constant-propagation.


I can get GCC9.3 -O0 to produce exactly this asm from this source code:

#include <stdio.h>
long global_var;

void bloop(long x) {
    if (!(x>0x29))
        printf("%ld", global_var);
}

on Godbolt with gcc9.3 -O0 -fpie -fverbose-asm:

# godbolt strips out directives like .section .rodata
.LC0:
        .string "%ld"

bloop:
        pushq   %rbp  #
        movq    %rsp, %rbp      #,
        subq    $16, %rsp       #,
        movq    %rdi, -8(%rbp)  # x, x
        cmpq    $41, -8(%rbp)   #, x
        jg      .L3 #,
        movq    global_var(%rip), %rax  # global_var, global_var.0_1
        movq    %rax, %rsi      # global_var.0_1,
        leaq    .LC0(%rip), %rdi        #,
        movl    $0, %eax        #,
        call    printf@PLT      #
.L3:
        nop     
        leave   
        ret

The nop has no purpose; I don't know why unoptimized GCC output sometimes has one.

See also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler output.




回答2:


Both mov and nop are instructions. An instruction is something the processor executes and is what makes up a machine program. If you are unfamiliar with this concept, it might be helpful to read a tutorial on assembly programming.

What instructions a function uses is largely unrelated to how many arguments and local variables it has. The presence of a nop and some mov instruction tells you nothing about the arguments and variables of a function.

What does tell you is what operands these instructions have. If you are unfamiliar with what operands are or how x86 instructions use their operands, I must once again ask you to refer to a tutorial as this is out of scope of this question.

The general approach to identifying function arguments is checking what caller-saved registers the function uses without previously assigning a value for them. While this is not a fool-proof way, it's usually the best heuristic there is.

In your function, the caller-saved registers rdi, rsi, and rax are used. Of these, only the original value of rdi has an effect on the function. As for rsi and rax, the function overwrites their original value without having a look at it. Thus these are unlikely to be function arguments (rax is never used for a function argument in the SysV calling convention). The function hence likely has one argument in rdi. I don't see any access to stack slots allocated by the caller, so it's unlikely that any extra arguments are hidden there either.

It could still be that the function was written to have arguments in rsi or some other registers and these arguments simply went unused. We'll never know for sure without extra information (e.g. debug symbols, disassembly of the call site, etc.).

As for local variables: there's in general no way to reconstruct what local variables a C function used when it has been compiled into assembly because the compiler can optimise local variables to the point where their existence is unrecognisable. It may also add additional local variables for various purposes.

However, in your specific case it is likely that the function was compiled with optimisations turned off. In this case, many C compilers compile C code in a very straightforward and predictable manner where one stack slot is allocated for each local variable and each memory access to the local variable generates one load or store to that stack slot.

It is however still not possible to say with absolute certainty what types these variables might have had or if two adjacent stack slots are two separate variables, one variable of a particularly large type (e.g. long double) or a variable of structure or array type. We'll again never know.

In your example, two stack slots of 8 byte each are allocated by the instruction sub $0x10, %rsp. As the compiler must allocate stack slots in 16 byte increments for alignment, this means that the original function has at least one variable (of a 64 bit type), but could have as many as nine (the others being of char type).

As only one of the stack slots (-0x8(%rbp)) ends up being accessed, we can only say for sure that the function has at least one variable. Since the access occurs with a 64 bit width, it is likely that said variable has a type that is 64 bits wide. The function could have extra unused local variables or the variable it has could be a structure with multiple members or an array, each of which only the first member is accessed. We can't say for sure.

It is also possible that no local variable exists and the compiler decided to use -0x8(%rbp) to spill some expression for some reason (it likes to do nonsensical spills like this when optimisations are turned off), but that seems unlikely.

So in summary: it is generally not possible to find out exactly what a C function looked like judging from the machine code, but you can often make an educated guess that gets you pretty far.

Hence, it is generally more useful to think in terms of “what could a C function with this machine code look like?” rather than “what did the C function look like that generated this machine code?” as you can never be certain.



来源:https://stackoverflow.com/questions/61416433/x86-explanation-number-of-function-arguments-and-local-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!