Questions about AT&T x86 Syntax design

后端 未结 4 1849
天涯浪人
天涯浪人 2020-12-01 10:47
  1. Can anyone explain to me why every constant in AT&T syntax has a \'$\' in front of it?
  2. Why do all registers have a \'%\'?
  3. Is this just another at
4条回答
  •  时光说笑
    2020-12-01 11:31

    The GNU assembler's AT&T syntax traces its origins to the Unix assembler 1, which itself took its input syntax mostly from the PDP-11 PAL-11 assembler (ca. 1970).

    Can anyone explain to me why every constant in AT&T syntax has a '$' in front of it?

    It allows to distinguish immediate constants from memory addresses. Intel syntax does it the other way around, with memory references as [foo].

    Incidentally, MASM (the Microsoft Assembler) doesn't need a distinction at the syntax level, since it can tell whether the operand is a symbolic constant, or a label. Other assemblers for x86 actively avoid such guesses, since they can be confusing to readers, e.g: TASM in IDEAL mode (it warns on memory references not in brackets), nasm, fasm.

    PAL-11 used # for the Immediate addressing mode, where the operand followed the instruction. A constant without # meant Relative addressing mode, where a relative address followed the instruction.

    Unix as used the same syntax for addressing modes as DEC assemblers, with * instead of @, and $ instead of #, since @ and # were apparently inconvenient to type 2.

    Why do all registers have a '%'?

    In PAL-11, registers were defined as R0=%0, R1=%1, ... with R6 also referred to as SP, and R7 also referred to as PC. The DEC MACRO-11 macro-assembler allowed referring to registers as %x, where x could be an arbitrary expression, e.g. %3+1 referred to %4.

    Is this just another attempt to get me to do a lot of lame typing?

    Nope.

    Also, am I the only one that finds: 16(%esp) really counterintuitive compared to [esp+16]?

    This comes from the PDP-11 Index addressing mode, where a memory address is formed by summing the contents of a register and an index word following the instruction.

    I know it compiles to the same thing but why would anyone want to type a lot of '$' and '%'s without a need to? - Why did GNU choose this syntax as the default?

    It came from the PDP-11.

    Another thing, why is every instruction in at&t syntax preceded by an: l? - I do know its for the operand sizes, however why not just let the assembler figure that out? (would I ever want to do a movl on operands that are not that size?)

    gas can usually figure it out. Other assemblers also need help in particular cases.

    The PDP-11 would use b for byte instructions, e.g: CLR vs CLRB. Other suffixes appeared in VAX-11: l for long, w for word, f for float, d for double, q for quad-word, ...

    Last thing: why are the mov arguments inverted?
    

    Arguably, since the PDP-11 predates Intel microprocessors, it is the other way around.


    1. According to gas info-page, through the BSD 4.2 assembler.
    2. Unix Assembler Reference Manual §8.1 - Dennis M. Ritchie

提交回复
热议问题