Writing a thunk to verify SysV ABI compliance

问题

The SysV ABI defines the C-level and assembly calling conventions for Linux.

I would like to write a generic thunk that verifies that a function satisfied the ABI restrictions on callee preserved registers and (perhaps) tried to return a value.

So given a target function like int foo(int, int) it's pretty easy3 to write such a thunk in assembly, something like¹:

foo_thunk:
push rbp
push rbx
push r12
push r13
push r14
push r15
call foo
cmp rbp, [rsp + 40]
jne bad_rbp
cmp rbx, [rsp + 32]
jne bad_rbx
cmp r12, [rsp + 24]
jne bad_r12
cmp r13, [rsp + 16]
jne bad_r13
cmp r14, [rsp + 8]
jne bad_r14
cmp r15, [rsp]
jne bad_r15
ret

Now of course I don't actually wan to write a separate foo_thunk method for each call, I just want one generic one. This one should take a pointer to the underlying function (let's say in rax), and would use an indirect call call [rax] than call foo but would otherwise be the same.

What I can't figure out is how to to implement the transparent use of the thunk at the C level (or in C++, where there seems to be more meta-programming options - but let's stick to C here). I want to take something like:

foo(1, 2);

and translate it to a call to the thunk, but still passing the same arguments in the same places (that's needed for the thunk to work).

It is expected that I modify the source, perhaps with macro or template magic, so the call above could be changed to:

CHECK_THUNK(foo, (1, 2));

Giving the macro the name of the underlying function. In principle it could translate this to²:

check_thunk(&foo, 1, 2);

How can I declare check_thunk though? The first argument is "some type" of function pointer. We could try:

check_thunk(void (*ptr)(void), ...);

So a "generic" function pointer (all pointers can validly be cast to this, and we'll only actually call it assembly, outside the claws of the language standard), plus varargs.

This doesn't work though: the ... has totally different promotion rules than a properly prototyped function. It will work for the foo(1, 2) example, but if you call foo(1.0, 2) instead, the varargs version will just leave the 1.0 as a double and you'll be calling foo with a totally wrong value (a double value punned as an integer.

The above also has the disadvantage of passing the function pointer as the first argument, which means the thunk no longer works as-is: it has to save the function pointer in rdi somewhere and then shift all the values over by one (i.e., mov rdi, rsi). If there are non-register args, things get really messy.

Is there any way to make this work smoothly?

Note: this type of thunk is basically incompatible with any passing of parameters on the stack, which is an acceptable limitation of this approach (it should simply not be used for functions with that many arguments or with MEMORY class arguments).

¹ This is checks the callee preserved registers, but the other checks are similarly straightforward.

² In fact, you don't even really need the macro for that - but it's also there so you can turn off the thunk in release builds and just do a direct call.

³ Well by "easy" I guess I mean one that doesn't work in all cases. The shown thunk doesn't correctly align the stack (easy to fix), and breaks if foo has any stack-passed arguments (significantly harder to fix).

回答1:

One way to do this, in a gcc-specific way, is to take advantage of typeof and nested functions to create a function pointer that embeds the call to the underlying function, but itself doesn't have any arguments.

This pointer can be passed to the thunk method, which calls it and verifies ABI compliance.

Here's an example of transforming a call to int add3(int, int, int) using this method:

The original call looks like:

int res = add3(a, b, c);

Then you wrap the call in a macro, like this²:

CALL_THUNKED(int res, add3, (a,b,c));

... which expands into something like:

    typedef typeof(add3  (a,b,c)) ret_type; 

    ret_type closure() {              
        return add3  (a,b,c);         
    }                                 
    typedef ret_type (*typed_closure)(void);  
    typedef ret_type (*thunk_t)(typed_closure); 

    thunk_t thunk = (thunk_t)closure_thunk; 
    int res = thunk(&closure);

We create the closure() function on the stack, which calls directly into add3 with the original arguments. We can take the address of this closure and pass it an asm function without difficulty: calling it will have the ultimate effect of calling add3 with the arguments¹.

The rest of the typedefs is basically dealing with the return type. We have only a single closure_thunk method, declared like this void* closure_thunk(void (*)(void)); and implemented in assembly. It takes a function pointer (any function pointer is convertible to any other), but the return type is "wrong". We cast it to thunk_t which is a dynamically generated typedef for a function that has the "right" return type.

Of course, that's certainly not legal for C functions, but we are implementing the function in asm, so we kind of sidestep the issue (if you wanted to be a bit more compliant, you could perhaps ask the asm code for a function pointer of the right type, which can "generate" it each time, outside of the reach of the standard: of course it's just returning the same pointer each time).

The closure_thunk function in asm is implemented along the lines of:

GLOBAL closure_thunk:function

closure_thunk:

push rsi
push_callee_saved

call rdi

; set up the function name
mov rdi, [rsp + 48]

; now check whether any regs were clobbered
cmp rbp, [rsp + 40]
jne bad_rbp
cmp rbx, [rsp + 32]
jne bad_rbx
cmp r12, [rsp + 24]
jne bad_r12
cmp r13, [rsp + 16]
jne bad_r13
cmp r14, [rsp + 8]
jne bad_r14
cmp r15, [rsp]
jne bad_r15

add rsp, 7 * 8
ret

That is, push all the registers we want to check on the stack (along with the function name), call the function in rdi and then do your checks. The bad_* methods aren't shown, but they basically spit out an error message like "Function add3 overwrote rbp... naughty!" and abort() the process.

This breaks if any arguments are passed on the stack, but it does work for return values passed on the stack (because the ABI for that case passes a pointer to the location for the return value in `rax).

¹ How this is accomplished is kind of magic: gcc actually writes a few bytes of executable code onto the stack, and the closure function pointer points there. The few bytes basically loads a register with a pointer to the region that contains the captured variables (a, b, c in this case), and then calls the actual (read-only) closure() code which then can access the captured variables though that pointer (and pass them to add3).

² As it turns out, we could probably use gcc's statement expression syntax to write the macro in a more usual function like syntax, something like int res = CALL_THUNKED(add3, (a,b,c)).

回答2:

At the C source level (without modifying gcc or the linker to insert the thunk for you), you could define different prototypes for each thunk but still share the same implementation.

You could put multiple labels on the definition in the asm source, so check_thunk_foo has the same address as check_thunk_bar, but you can use a different C prototype for each.

Or you could make weak aliases like this:

int check_thunk_foo(void*, int, int) 
    __attribute__ ((weak, alias ("check_thunk_generic")));
// or maybe this should be ((weakref ("check_thunk_generic")))

#define foo(...) check_thunk_foo((void*)&foo, __VA_ARGS__)

// or to put the args in their original slots,
// but then you'd need different thunks for different numbers of integer args.
#define foo(x, y) check_thunk_foo((x), (y), (void*)&foo)

The major downside to this is that you need to copy+modify the original prototype for every function. You could hack this up with CPP macros so there's a single point of definition for the arg list, and the real prototype (and the thunk if enabled) both use it. Possibly by re-including the same .h twice, with a wrapper macro defined differently. Once for the real prototypes, again for the thunks.

BTW, passing the function pointer as an extra arg to a generic thunk is potentially problematic. I think it's not possible to reliably remove the first arg and forward the rest in the x86-64 SysV ABI. You don't know how many stack args there are, for functions that take more than 6 integer args. And you don't know if there are FP stack args before the first integer stack arg.

This should work fine for functions that pass all their register-possible args in registers. (i.e. if there are any stack args, they're large structs by value or other things that couldn't go in an integer register.)

To solve this problem, the thunk could dispatch based on return address instead of an extra hidden arg, if you had something like debug info to map call site return addresses to call targets. Or you could maybe get gcc to pass a hidden arg in rax or r11. Running call from inline asm sucks a lot, so you'd maybe need to customize gcc with support for some special attribute that passed a function pointer in an extra register.

but if you call foo(1.0, 2) instead, the varargs version will just leave the 1.0 as a double and you'll be calling foo with a totally wrong value (a double value punned as an integer.

Not that it matters, but no, you'd be calling foo(2, garbage) with xmm0=(double)1.0. Variadic functions still use register args the same as non-variadic functions (or with the option of passing FP args on the stack before you run out of registers, and setting al= less than 8).

来源：https://stackoverflow.com/questions/46905229/writing-a-thunk-to-verify-sysv-abi-compliance

标签

Linux

x86

function-pointers

interceptor