When should I use size directives in x86?

假装没事ソ 提交于 2019-12-29 00:07:14

问题


When to use size directives in x86 seems a bit ambiguous. This x86 assembly guide says the following:

In general, the intended size of the of the data item at a given memory address can be inferred from the assembly code instruction in which it is referenced. For example, in all of the above instructions, the size of the memory regions could be inferred from the size of the register operand. When we were loading a 32-bit register, the assembler could infer that the region of memory we were referring to was 4 bytes wide. When we were storing the value of a one byte register to memory, the assembler could infer that we wanted the address to refer to a single byte in memory.

The examples they give are pretty trivial, such as mov'ing an immediate value into a register.
But what about more complex situations, such as the following:

mov    QWORD PTR [rip+0x21b520], 0x1

In this case, isn't the QWORD PTR size directive redundant since, according to the above guide, it can be assumed that we want to move 8 bytes into the destination register due to the fact that RIP is 8 bytes? What are the definitive rules for size directives on the x86 architecture? I couldn't find an answer for this anywhere, thanks.

Update: As Ross pointed out, the destination in the above example isn't a register. Here's a more relevant example:

mov    esi, DWORD PTR [rax*4+0x419260] 

In this case, can't it be assumed that we want to move 4 bytes because ESI is 4 bytes, making the DWORD PTR directive redundant?


回答1:


You're right; it is rather ambiguous. Assuming we're talking about Intel syntax, it is true that you can often get away with not using size directives. Any time the assembler can figure it out automatically, they are optional. For example, in the instruction

mov    esi, DWORD PTR [rax*4+0x419260] 

the DWORD PTR specifier is optional for exactly the reason you suppose: the assembler can figure out that it is to move a DWORD-sized value, since the value is being moved into a DWORD-sized register.

Similarly, in

mov    rsi, QWORD PTR [rax*4+0x419260] 

the QWORD PTR specifier is optional for the exact same reason.

But it is not always optional. Consider your first example:

mov    QWORD PTR [rip+0x21b520], 0x1

Here, the QWORD PTR specifier is not optional. Without it, the assembler has no idea what size value you want to store starting at the address rip+0x21b520. Should 0x1 be stored as a BYTE? Extended to a WORD? A DWORD? A QWORD? Some assemblers might guess, but you can't be assured of the correct result without explicitly specifying what you want.

In other words, when the value is in a register operand, the size specifier is optional because the assembler can figure out the size based on the size of the register. However, if you're dealing with an immediate value or a memory operand, the size specifier is probably required to ensure you get the results you want.

Personally, I prefer to always include the size when I write code. It's a couple of characters more typing, but it forces me to think about it and state explicitly what I want. If I screw up and code a mismatch, then the assembler will scream loudly at me, which has caught bugs more than once. I also think having it there enhances readability. So here I agree with old_timer, even though his perspective appears to be somewhat unpopular.

Disassemblers also tend to be verbose in their outputs, including the size specifiers even when they are optional. Hans Passant theorized in the comments this was to preserve backwards-compatibility with old-school assemblers that always needed these, but I'm not sure that's true. It might be part of it, but in my experience, disassemblers tend to be wordy in lots of different ways, and I think this is just to make it easier to analyze code with which you are unfamiliar.

Note that AT&T syntax uses a slightly different tact. Rather than writing the size as a prefix to the operand, it adds a suffix to the instruction mnemonic: b for byte, w for word, l for dword, and q for qword. So, the three previous examples become:

movl    0x419260(,%rax,4), %esi
movq    0x419260(,%rax,4), %rsi
movq    $0x1, 0x21b520(%rip)

Again, on the first two instructions, the l and q prefixes are optional, because the assembler can deduce the appropriate size. On the last instruction, just like in Intel syntax, the prefix is non-optional. So, the same thing in AT&T syntax as Intel syntax, just a different format for the size specifiers.




回答2:


RIP, or any other register in the address is only relevant to the addressing mode, not the width of data transfered. The memory reference [rip+0x21b520] could be used with a 1, 2, 4, or 8-byte access, and the constant value 0x01 could also be 1 to 8 bytes (0x01 is the same as 0x00000001 etc.) So in this case, the operand size has to be explicitly mentioned.

With a register as the source or destination, the operand size would be implicit: if, say, EAX is used, the data is 32 bits or 4 bytes:

mov    [rip+0x21b520],eax

And of course, in the awfully beautiful AT&T syntax, the operand size is marked as a suffix to the instruction mnemonic (the l here).

movl $1, 0x21b520(%rip) 



回答3:


it gets worse than that, an assembly language is defined by the assembler, the program that reads/interprets/parses it. And x86 in particular but as a general rule there is no technical reason for any two assemblers for the same target to have the same assembly language, they tend to be similar, but dont have to be.

You have fallen into a couple of traps, first off the specific syntax used for the assembler you are using with respect to the size directive, then second, is there a default. My recommendation is ALWAYS use the size directive (or if there is a unique instruction mnemonic), then you never have to worry about it right?



来源:https://stackoverflow.com/questions/44577130/when-should-i-use-size-directives-in-x86

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!