问题
As mentioned in the title of this question, when I modify some registers inside the asm statement, for a temporary reason, which option is better in between the clobber and dummy output?
For example, I implemented two versions of the exchange function in the link, and found that two versions generate the same amount of output instructions.
Which version should I use? Should I use the one with the dummy output to allow the compiler choose the register that may optimize entire function as much as possible?
If the answer is yes, then when should I use the clobber list? Is it only okay to use the clobber list when one instruction requires you to load its operand to the specific registers? such as syscall instruction requires its parameter should be located in register rdi rsi rdx r10 r8 r9??
回答1:
You should normally let the compiler pick registers for you, using an early-clobber dummy output with any required constraints1. This gives it flexibility to do register allocation for the function.
1 e.g. you can use =&Q to get one of RAX/RBX/RCX/RDX: registers that have an AH/BH/CH/DH. If you wanted to unpack 8-bit fields with movzbl %h[input], %[high_byte]
; movzbl %b[input], %[low_byte] ; shr $16, %[input], you'd need a register that has it's 2nd 8-bit chunk aliased to a high-8 register.
Out of curiosity, when we consider a calling convention of amd64, some registers can be freely used inside the functions; and we could implement some functions by only using those registers inside the asm statement. Why allowing the compiler to choose the registers to be used is better than the mentioned one?
Because functions can inline, maybe into a loop that calls other functions, thus the compiler would want to give it inputs in call-preserved registers. If you were writing a stand-alone function that the compiler always has to call, all you get from inline asm instead of stand-alone is the compiler handling calling-convention differences and C++ name-mangling.
Or maybe the surrounding code uses some instructions that require fixed registers, like cl for shift counts or RDX:RAX for div.
when should I use the clobber list? ... such as syscall instruction requires its parameter should be located in register rdi rsi rdx r10 r8 r9??
Normally you'd use input constraints instead, so only the syscall instruction itself is inside the inline asm. But syscall (the instruction itself) clobbers RCX and R11, so system calls made using it unavoidably destroy user-space's RCX and R11. There's no point using dummy outputs for these, unless you have a use for the return address (RCX) or RFLAGS (R11). So yes, clobbers are useful here.
// the compiler will emit all the necessary MOV instructions
#include <stddef.h>
#include <asm/unistd.h>
// the compiler will emit all the necessary MOV instructions
//static inline
size_t sys_write(int fd, const char *buf, size_t len) {
size_t retval;
asm volatile("syscall"
: "=a"(retval) // EDI RSI RDX
: "a"(__NR_write), "D"(fd), "S"(buf), "d"(len)
, "m"(*(char (*)[len]) buf) // dummy memory input: the asm statement reads this memory
: "rcx", "r11" // clobbered by syscall
// , "memory" // would be needed if we didn't use a dummy memory input
);
return retval;
}
A non-inline version of this compiles as follows (with gcc -O3 on the Godbolt compiler explorer), because the function-calling convention nearly matches the system-call convention:
sys_write(int, char const*, unsigned long):
movl $1, %eax
syscall
ret
It would have been really silly to use clobbers on any of the input registers and put a mov inside the asm:
size_t dumb_sys_write(int fd, const char *buf, size_t len) {
size_t retval;
asm volatile(
"mov %[fd], %%edi\n\t"
"mov %[buf], %%rsi\n\t"
"mov %[len], %%rdx\n\t"
"syscall"
: "=a"(retval) // EDI RSI RDX
: "a"(__NR_write), [fd]"r"(fd), [buf]"r"(buf), [len]"r"(len)
, "m"(*(char (*)[len]) buf) // dummy memory input: the asm statement reads this memory
: "rdi", "rsi", "rdx", "rcx", "r11"
// , "memory" // would be needed if we didn't use a dummy memory input
);
// if(retval > -4096ULL) errno = -retval;
return retval;
}
dumb_sys_write(int, char const*, unsigned long):
movl %edi, %r9d
movq %rsi, %r8
movq %rdx, %r10
movl $1, %eax # compiler generated before this
# from inline asm
mov %r9d, %edi
mov %r8, %rsi
mov %r10, %rdx
syscall
# end of inline asm
ret
And besides that, you're not letting the compiler take advantage of the fact that syscall doesn't clobber any of its input registers. The compiler might well still want len in a register, and using a pure input constraint lets it know that the value will still be there afterwards.
You might also use clobbers if you're using any instructions that implicitly use certain registers, but neither the input nor output of those instructions is a direct input or output of the asm statement. That would be rare, though, unless you're writing a whole loop or large block of code in inline asm.
Or maybe if you're wrapping a call instruction. (It's hard to do this safely, especially because of the red-zone, but people do try to do this). You don't get to choose which registers the code clobbers, so you just tell the compiler about it.
来源:https://stackoverflow.com/questions/54061267/for-temporary-registers-in-the-asm-statement-should-i-use-clobber-or-dummy-outp