Why operand must have size in one line but not the other in x86 assembly

问题

Looking at the picture, on line 34 I had to write the word ptr for this to work, while on line 44 I didn't.
Why is that?

Can't the compiler know that 0020h is a word just like 0FF20h is a word?
Adding 0 to 0020h making it 00020h or anything like that doesn't work either.

I am using MASM on 80x86. emu8086, also tried on dosbox v0.74

回答1:

The difference is because your assembler strangely and dangerously accepts 0FF20h as implying word operand-size. But even for your assembler, leading zeros don't imply operand-size, just the actual value; presumably it checks the position of the most significant bit.

This is not the case for a well-designed and consistent assembler syntax like NASM: If I try to assemble this in 16-bit mode with nasm -fbin foo.asm

mov [es: si], 2
mov [es: si], 0ff20H

I get these errors:

foo.asm:1: error: operation size not specified
foo.asm:2: error: operation size not specified

Only a register can imply an operand-size for the whole instruction, not the width of a constant. (mov [si], ax is not ambiguous: there is no form of mov where the destination has a different width than the source, and ax is definitely word sized.)

Same applies for GAS (the GNU assembler), in both AT&T and Intel syntax modes. (Its Intel-syntax mode is very similar to MASM.)

There's no mov r/m16, sign_extended_imm8 encoding, but there is for add and most ALU operations, so there's no reason for an assembler to assume that xyz [mem], 0 means byte operand size. More likely the programmer forgot to specify, so it treats it as an error instead of silently accepting something ambiguous.

mov word [mem], 0 is a totally normal way to zero a word in memory.

Besides all that, x86 supports 32-bit operand size in 16-bit code, using a 66h operand-size prefix. This is independent from the address-size.

mov dword ptr es:[si], 0FF20h is also encodeable, and completely ambiguous with mov word ptr es:[si], 0FF20h if you leave out the size ptr specifier.

As Jester commented, if leading zeros counted as part of the width of the constant, 0FF20h could easily be taken as implying dword.

Note that you had to write 0FF20H with a leading zero too so if the assembler really relied on the length of the literal, it could have thought that was a dword ... similarly for 0FFH. It would be a dangerous game. Note sensible assemblers don't even allow your second form without explicit size. That's just a bug waiting to happen.

(Sensible assemblers include NASM and GAS, like I showed above).

If I were you, I'd be unhappy that my assembler accepted mov es:[si], 0FF20h without complaint. I thought emu8086 was even worse than MASM, and usually accepted stuff like mov [si], 2 with some default operand size instead of warning even then.

I'm not a big fan of how MASM magically infers operand-size from symbol db 1, 2, 3 either, but that's not ambiguous, it just means you have to look at how a symbol was declared to know what operand-size it will imply.

来源：https://stackoverflow.com/questions/49910109/why-operand-must-have-size-in-one-line-but-not-the-other-in-x86-assembly

标签

assembly

x86

MASM

emu8086