What was the original reason for the design of AT&T assembly syntax? [closed]

问题

When using assembly instructions on x86 or amd64, programmer can use "Intel" (i.e. nasm compiler) or "AT&T" (i.e. gas compiler) assembly syntax. "Intel" syntax is more popular on Windows, but "AT&T" is more popular on UNIX(-like) systems.

But both Intel and AMD manuals, so manuals created by the creators of the chip, are both using the "Intel" syntax.

I'm wondering, what was the original idea behind the design of the "AT&T" syntax? What was the benefit for floating away from notation used by the creators of the processor?

回答1:

UNIX was for a long time developed on the PDP-11, a 16 bit computer from DEC, which had a fairly simple instruction set. Nearly every instruction has two operands, each of which can have one of the following eight addressing modes, here shown in the MACRO 16 assembly language:

0n  Rn        register
1n  (Rn)      deferred
2n  (Rn)+     autoincrement
3n  @(Rn)+    autoincrement deferred
4n  -(Rn)     autodecrement
5n  @-(Rn)    autodecrement deferred
6n  X(Rn)     index
7n  @X(Rn)    index deferred

Immediates and direct addresses can be encoded by cleverly re-using some addressing modes on R7, the program counter:

27  #imm      immediate
37  @#imm     absolute
67  addr      relative
77  @addr     relative deferred

As the UNIX tty driver used @ and # as control characters, $ was substituted for # and * for @.

The first operand in a PDP11 instruction word refers to the source operand while the second operand refers to the destination. This is reflected in the assembly language's operand order which is source, then destination. For example, the opcode

refers to the instruction

mov (R2),R3

which moves the word pointed to by R2 to R3.

This syntax was adapted to the 8086 CPU and its addressing modes:

mr0 X(bx,si)  bx + si indexed
mr1 X(bx,di)  bx + di indexed
mr2 X(bp,si)  bp + si indexed
mr3 X(bp,di)  bp + di indexed
mr4 X(si)     si indexed
mr5 X(di)     di indexed
mr6 X(bp)     bp indexed
mr7 X(bx)     bx indexed
3rR R         register
0r6 addr      direct

Where m is 0 if there is no index, m is 1 if there is a one-byte index, m is 2 if there is a two-byte index and m is 3 if instead of a memory operand, a register is used. If two operands exist, the other operand is always a register and encoded in the r digit. Otherwise, r encodes another three bits of the opcode.

Immediates aren't possible in this addressing scheme, all instructions that take immediates encode that fact in their opcode. Immediates are spelled $imm just like in the PDP-11 syntax.

While Intel always used a dst, src operand ordering for its assembler, there was no particularly compelling reason to adapt this convention and the UNIX assembler was written to use the src, dst operand ordering known from the PDP11.

They made some inconsistencies with this ordering in their implementation of the 8087 floating point instructions, possibly because Intel gave the two possible directions of non-commutative floating point instructions different mnemonics which do not match the operand ordering used by AT&T's syntax.

The PDP11 instructions jmp (jump) and jsr (jump to subroutine) jump to the address of their operand. Thus, jmp foo would jump to foo and jmp *foo would jump to the address stored in the variable foo, similar to how lea works in the 8086.

The syntax for the x86's jmp and call instructions was designed as if these instructions worked like on the PDP11, which is why jmp foo jumps to foo and jmp *foo jumps to the value at address foo, even though the 8086 doesn't actually have deferred addressing. This has the advantage and convenience of syntactically distinguishing direct jumps from indirect jumps without requiring an $ prefix for every direct jump target but doesn't make a lot of sense logically.

The syntax was expanded to specify segment prefixes using a colon:

seg:addr

When the 80386 was introduced, this scheme was adapted to its new SIB addressing modes using a four-part generic addressing mode:

disp(base,index,scale)

where disp is a displacement, base is a base register, index an index register and scale is 1, 2, 4, or 8 to scale the index register by one of these amounts. This is equal to Intel syntax:

[disp+base+index*scale]

Another remarkable feature of the PDP-11 is that most instructions are available in a byte and a word variant. Which one you use is indicated by a b or w suffix to the opcode, which directly toggles the first bit of the opcode:

 010001   movw r0,r1
 110001   movb r0,r1

this also was adapted for AT&T syntax as most 8086 instructions are indeed also available in a byte mode and a word mode. Later the 80386 and AMD K6 introduced 32 bit instructions (suffixed l for long) and 64 bit instructions (suffixed q for quad).

Last but not least, originally the convention was to prefix C language symbols with an underscore (as is still done on Windows) so you can distinguish a C function named ax from the register ax. When Unix System Laboratories developed the ELF binary format, they decided to get rid of this decoration. As there is no way to distinguish a direct address from a register otherwise, a % prefix was added to every register:

mov direct,%eax # move memory at direct to %eax

And that's how we got today's AT&T syntax.

回答2:

Assembly language is defined by the assembler, the software that parses the assembly language. The only "standard" is the machine code, that has to match the processor, but if you take 100 programmers and give them the machine code standard (without any assembly language hints) you will end up with somewhere between 1 and 100 different assembly languages. Which will all work perfectly well for all use cases of that processor (baremetal, operating system, application work) so long as they make a complete tool that fits in with a toolchain.

It is in the best interest of the the creator of the instruction set, the machine code, to create both a document describing the instruction set and an assembler, the first tool you need. They can contract it out or make it in house, either way doesnt matter, but having an assembler, with a syntax, and a document for the machine code, which uses the assembler's syntax to connect the dots between the two, will give anyone possibly interested in that processor a starting point. As was the case with intel and the 8086/88. But that doesnt mean that masm and tasm were completely compatible with intels assembler. Even if the syntax per instruction matched, the per instruction syntax is only part of the assembly language there is a lot of non-instruction type syntax, directives, macro language, etc. And that was from the DOS end of the world, there was the UNIX end and thus AT&T. gnu folks at the time were unix end of the world so it makes perfect sense that they used the AT&T syntax or a derivative of as they generally mess up assembly language during a port. Perhaps there is an exception.

nasm and some others like it are an attempt to continue the masm syntax as masm is a closed sourced Microsoft tool (as was tasm and whatever was with Borland C if that wasnt tasm as well). These might be open sourced now but no need, easier to write one from scratch than to try to port that code, I assume to be built with a modern compiler, and nasm already exists.

The why question is like asking my why you chose the pair of socks you chose this morning or any particular day. Your socks may not have as big of an impact on the rest of the world, but the question is equally irrelevant and/or unanswerable. The answer goes back in part to the ask 100 programmers to make an assembler for the same machine code definition. some of these programmers may be experienced with assembly language and may choose to create an assembly language in the image of one they have used before which means several of them will make one that looks pretty similar to each other. But the one or ones they used before may be different so there would be groups of these similar but still different. Then in lets say 30 years ask each one of those 100 people the why question...if they are still alive...Like asking me why you chose to declare a variable in a program you wrote 30 years ago in the way you did it.

来源：https://stackoverflow.com/questions/42244028/what-was-the-original-reason-for-the-design-of-att-assembly-syntax

标签

assembly

x86

intel

att