Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

后端 未结 3 1808
死守一世寂寞
死守一世寂寞 2020-11-21 05:20

In the x86-64 Tour of Intel Manuals, I read

Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically ze

相关标签:
3条回答
  • 2020-11-21 05:37

    It simply saves space in the instructions, and the instruction set. You can move small immediate values to a 64-bit register by using existing (32-bit) instructions.

    It also saves you from having to encode 8 byte values for MOV RAX, 42, when MOV EAX, 42 can be reused.

    This optimization is not as important for 8 and 16 bit ops (because they are smaller), and changing the rules there would also break old code.

    0 讨论(0)
  • 2020-11-21 05:41

    Without zero extending to 64 bits, it would mean an instruction reading from rax would have 2 dependencies for its rax operand (the instruction that writes to eax and the instruction that writes to rax before it), this means that 1) the ROB would have to have entries for multiple dependencies for a single operand, which means the ROB would require more logic and transistors and take up more space, and execution would be slower waiting on an unnecessary second dependency that might take ages to execute; or alternatively 2), which I'm guessing happens with the 16 bit instructions, the allocation stage probably stalls (i.e. if the RAT has an active allocation for an ax write and an eax read appears, it stalls until the ax write retires).

    mov rdx, 1
    mov rax, 6
    imul rax, rdx
    mov rbx, rax
    mov eax, 7 //retires before add rax, 6
    mov rdx, rax // has to wait for both imul rax, rdx and mov eax, 7 to finish before dispatch to the execution units, even though the higher order bits are identical anyway
    

    The only benefit of not zero extending is ensuring the higher order bits of rax are included, for instance, if it originally contains 0xffffffffffffffff, the result would be 0xffffffff00000007, but there's very little reason for the ISA to make this guarantee at such an expense, and it's more likely that the benefit of zero extension would actually be required more, so it saves the extra line of code mov rax, 0. By guaranteeing it will always be zero extended to 64 bits, the compilers can work with this axiom in mind whilst in mov rdx, rax, rax only has to wait for its single dependency, meaning it can begin execution quicker and retire, freeing up execution units. Furthermore, it also allows for more efficient zero idioms like xor eax, eax to zero rax without requiring a REX byte.

    0 讨论(0)
  • 2020-11-21 05:48

    I'm not AMD or speaking for them, but I would have done it the same way. Because zeroing the high half doesn't create a dependency on the previous value, that the CPU would have to wait on. The register renaming mechanism would essentially be defeated if it wasn't done that way.

    This way you can write fast code using 32-bit values in 64-bit mode without having to explicitly break dependencies all the time. Without this behaviour, every single 32-bit instruction in 64-bit mode would have to wait on something that happened before, even though that high part would almost never be used. (Making int 64-bit would waste cache footprint and memory bandwidth; x86-64 most efficiently supports 32 and 64-bit operand sizes)

    The behaviour for 8 and 16-bit operand sizes is the strange one. The dependency madness is one of the reasons that 16-bit instructions are avoided now. x86-64 inherited this from 8086 for 8-bit and 386 for 16-bit, and decided to have 8 and 16-bit registers work the same way in 64-bit mode as they do in 32-bit mode.


    See also Why doesn't GCC use partial registers? for practical details of how writes to 8 and 16-bit partial registers (and subsequent reads of the full register) are handled by real CPUs.

    0 讨论(0)
提交回复
热议问题