I've been trying to understand the purpose of the 0x40
REX opcode for ASM x64 instructions. Like for instance, in this function prologue from Kernel32.dll:
As you see they use push rbx
as:
40 53 push rbx
But using just the 53h
opcode (without the prefix) also produces the same result:
According to this site, the layout for the REX prefix is as follows:
So 40h
opcode seems to be not doing anything. Can someone explain its purpose?
the 04xh
bytes (i.e. 040h
, 041h
... 04fh
) are indeed REX bytes. Each bit in the lower nibble has a meaning, as you listed in your question. The value 040h
means that REX.W
, REX.R
, REX.X
and REX.B
are all 0
. That means that adding this byte doesn't do anything to this instruction, because you're not overriding any default REX bits, and it's not an 8-bit instruction with AH/BH/CH/DH as an operand.
Moreover, the X
, R
and B
bits all correspond to some operands. If your instruction doesn't consume these operands, then the corresponding REX bit is ignored.
I call this a dummy REX prefix, because it does nothing before a push or pop. I wondered whether it is allowed and your experience show that it is.
It is there because the people at Microsoft apparently generated the above code. I'd speculate that for the extra registers it is needed, so they generate it always and didn't bother to remove it when it is not needed. Another possibility is that the lengthening of the instruction has a subtle effect on scheduling and or aligning and can make the code faster. This of course requires detailed knowledge of the particular processor.
I'm working at an optimiser that looks at machine code. Dummy prefixes are helpful because they make the code more uniform; there are less cases to consider. Then as a last step superfluous prefixes can be removed among other things.
来源:https://stackoverflow.com/questions/50260055/what-is-the-purpose-of-the-40h-rex-opcode-in-asm-x64