Moving a value of a lesser size into a register

问题

I have stored a one-byte value of 8 and I'd like to move that into the rax register. I'm currently doing this with movzx to zero-extend the byte:

.globl main
main:
    push %rbp
    mov %rsp, %rbp
    movb $8, -1(%rbp)
    movzx -1(%rbp), %rax <-- here
    ...

How does the movzx instruction 'know' that the value at -1(%rbp) is only one byte long? From here is says, if I'm reading it properly, that it can work on both a byte and a word, but how would it know? For example, if I added a two-byte value at -2(%rbp) how would it know to grab the two-byte value? Is there another instruction where I can just grab a one or two or four byte value at an address and insert it into a 64 bit register?

I suppose another way to do it would be to first zero-out the register and then add it to the 8-bit (or however many bits) component, such as:

mov $0, %rax
mov -1(%rbp), %al

Is there one way that is more preferred than another way?

回答1:

How does the movzx instruction 'know' that the value at -1(%rbp) is only one byte long?

There are two (or even three) instructions:

movzxb (-1(%rbp) is one byte long) and movzxw (-1(%rbp) is one 16-bit word long).

My assembler interprets movzx as movzxb; however, you should not rely on that!

Better use the instruction name including the source size (movzxb or movzxw) to ensure that the assembler uses the correct instruction.

回答2:

It's ambiguous and relies on some default, you shouldn't write code like that.

That's why AT&T syntax has movzb and movzw instructions (typically used as movzbl -1(%rbp), %eax), for the two different source sizes of the Intel-syntax movzx mnemonic. See Are x86 Assembly Mnemonic standarized? (no, AT&T makes up new names.)

And yes, you could xor %eax,%eax / mov -1(%rbp), %al to merge into the low byte, but that's pointlessly inefficient. x86-64 guarantees the availability of 386 instructions like movzx.

Surprisingly, movzx -1(%rbp), %rax does assemble. If you assemble it, then disassemble back into AT&T syntax with objdump -d foo.o, you get movzbq (byte to quad), including a useless REX prefix instead of letting implicit zero-extension do the job after writing EAX.

48 0f b6 45 ff          movzbq -0x1(%rbp),%rax

Or disassemble into Intel syntax with objdump -drwC -Mintel:

48 0f b6 45 ff          movzx  rax,BYTE PTR [rbp-0x1]

Fun fact: GAS can't infer movzb vs. movzw if you write just movz, because movz isn't an instruction mnemonic. Unlike operand-size suffixes that can be inferred from the operands, the b and w are treated as part of the mnemonic. But you can write movzx and then it will infer both sizes from register operands, just like in Intel-syntax mode.

   5:   0f b6 c0                movzbl %al,%eax         # source: movzx %al, %eax
   8:   0f b7 c0                movzwl %ax,%eax         # source: movzx %ax, %eax

movzw and movzb act like instruction mnemonics in their own right (that can infer a size suffix from the destination register). Semi-related: What does the MOVZBL instruction do in IA-32 AT&T syntax?

Also related: a table of cdq and so on equivalents in terms of movsx and AT&T equivalents: What does cltq do in assembly?

Also related: MOVZX missing 32 bit register to 64 bit register - because that's implicit in writing a 32-bit register.

来源：https://stackoverflow.com/questions/63389129/moving-a-value-of-a-lesser-size-into-a-register

标签

assembly

x86-64

att

zero-extension