Why did they use numbers for register names in x86-64?

问题

AFAIK x86-64 adds a number of general purpose registers to those derived from Intel x86 (rax, rcx, etc), called r8-r15.

Why did they name the new registers like this? Why not just follow existing naming convention and call them like rfx, rgx ... ?

回答1:

Numbering CPU registers is the norm, almost any processor does that. The 8086 processor however is ancient, they had an extremely limited transistor budget back in 1976. Implementing a 16-bit processor with only 20,000 active transistors was quite a tour-de-force. One way they cut down was by giving registers dedicated functions. At that point it made sense to give them names rather than numbers, hinting at their usage. Another influence was that it was designed to provide a level of compatibility with the 8080 processor, it also had named registers with dedicated functions.

The exact opposite design was the Motorola 68000, designed three years later with a more advanced process technology that permitted double the transistor budget. A very orthogonal design with (almost) every register freely usable in any instruction. And no compatibility with earlier designs. It had numbered registers (D0-D7 and A0-A7).

Extensions to the x86 architecture uses numbered registers again, like R8 through R15, MM0 to MM7, XMM0-15, YMM0-15, etc.

回答2:

Why not just follow existing naming convention

Because the low 8 names aren't an arbitrary sequence or a convention, they're named for their specific purpose. r8-r15 don't have any specific purpose and almost no implicit uses or specialness. All of the original 8 registers have at least one instruction that uses that register implicitly. See https://www.swansontec.com/sregisters.html for what the names mean. (It's possible that EDX=data is a backronym, but A for accumulator and C for counter are clearly not a coincidence).

AMD64 was designed around 2000, and aims to be as orthogonal as possible so it's a better compiler target (compilers have an easier time when it doesn't matter which register a value is in).

It was already well established in 2000 point that it's normal to have numbered registers when there's nothing special about them; all RISC ISAs and many more-recent CISC ISAs do that. (See @Hans' answer)

Early x86 extensions (especially 286 / 386) made the ISA more orthogonal than 8086 by adding a multi-operand imul r,r/m and imul r, r/m, imm that doesn't need EAX/AX, movsx as a non-EAX version of cbw (and also 8->32 in one instruction), and especially 32-bit addressing modes allowing any register for anything, not just [bx|bp] + [si|di] + disp0/8/16. But no new registers were added, and the implicit uses were not removed when more flexible ways were added, so in modern x86 the names are just reminders of the implicit uses, not what you have to use each register for.

Dave Christie (an AMD64 CPU architect at AMD) posted this on the x86-64.org mailing list on 2000-sep-15, in reply to a discussion about renaming the old registers R0..R7, or naming the upper registers UAX / ....

Figuring out how to best name the registers was actually one of the hardest parts of doing the register extension. The primary motivation for keeping the AX, BX, etc, nomenclature for the lower eight was exactly what Honza suggests [in an earlier message in the thread] -- there are various special-case uses of most of those registers, which experienced x86 programmers are very familiar with, and which are actually reflected in the mnemonics (A=accumulator, C=Count, SP=Stack Pointer, SI=Source Index, etc), which helps newbies remember these special uses.

There are some artifacts of this special-case functionality that are reflected in the upper registers, but only for instruction encoding¹ -- none of the special functionality is reflected, so it would really be misleading to use UAX, etc. We ended up naming them R8-R15, acknowledging that some people might prefer to think of the lower set as R0-R7, but never using such names in our documentation, to avoid confusion.

An assembler is free to define such aliases, although as Alex points out, we feel it would increase the chances of confusion and mistakes if both sets of names were simultaneously allowed. So if such aliases are defined, I'd recommend it be done in a way that a programmer be allowed to enable only one set or the other.

(footnote added by me: the upper-register special cases are in addressing-mode encodings: R13 is like RBP, and can't be a base with no displacement. R12 is like RSP and needs a SIB byte. But unlike RSP, it can be an index. See the bottom of my answer on Why are rbp and rsp called general purpose registers?)

The x86-64.org mailing list archives have some interesting discussions between gcc and (Linux) kernel developers, and AMD64 architects. If you've ever wondered exactly how the x86-64 System V calling convention was designed, and why it passes the first few args in rdi, rsi, rdx, rcx, it turns out that Jan (Honza) Hubicka designed it based on (dynamic) instruction counts and (static) code-size for SPECint using a build of then-current gcc which maybe liked to inline rep movs for small copies. See Why does Windows64 use a different calling convention from all other OSes on x86-64? for mailing list archive links and more details.

Nobody in that discussion suggested using letters like rfx and so on.

It would have been an alphabet soup, and a separate naming scheme makes it really easy to distinguish new registers from old. This lets you see when an instruction needs a REX prefix or not (smaller code-size is almost always better). e.g. mov eax, edx is 1 byte shorter than mov eax, r8d.

And also, a REX prefix means you can't access AH/CH/DH/BH, so if you're using those byte regs you have to keep track of what you're doing. (e.g. you can unpack bytes from a qword with movzx r8d, bl / movzx ecx, bh / shr rbx, 16, but you can't movzx r9d, bh (REX.B for a high reg) or movsx rcx, bh (REX.W for 64-bit destination.)

Making it easy to see / remember which regs are new is also helpful for kernel developers, e.g. in a kernel entry-point from 32-bit user-space, the valuable user-space state is only in eax..esi, and it's easy to remember that r8-r15 are "new" registers that 32-bit code can't touch.

This may seem minor now, but when an ISA is new all the asm programmers have to learn it. The architects at AMD working on AMD64 put a lot of thought into naming schemes, and IMO did a nice job.

What do the E and R prefixes stand for in the names of Intel 32-bit and 64-bit registers?
Why are first four x86 GPRs named in such unintuitive order? on retrocomputing: how designing 8086 for easy source-porting of 8080 asm influenced the design of the register set,
and The start of x86: Intel 8080 vs Intel 8086? explains that lahf / sahf exist for more efficient 8080 compatibility.
Where original x86 register names come from, like SI = source index.
My answer on Why are rbp and rsp called general purpose registers? a (non-exhaustive) list of implicit uses and special-ness for each x86-64 integer register, including in addressing modes. (Fun fact: one of the few implicit uses of RBX in compiler-generated code is cmpxchg16b.)

回答3:

Why not just follow existing naming convention and call them like rfx, rgx ... ?

In addition to the answers given: Your assumption that the registers of x86 CPUs are named A-B-C-D-... is not true:

The 8086 was introduced before the M68000 and the register names of nearly all CPUs before the M68000 were named by function.

The same was true for the 8086:

R0 = AX = Accumulator
R1 = CX = Counter
R2 = DX = Data
R3 = BX = (memory) Base
...

Please also note that the registers are not named AX-BX-CX-DX, but AX-CX-DX-BX!

Explanation about the functions:

Accumulator: 8-bit CPUs could not do any operation on any register (like add dl, bl). Instead the "A" register was the implicit "left" operand of an operation: The Z80 instruction "add 30" would be written as "add al, 30" in x86 syntax. ax was intended to be used in such situations. And some operations (mul) still implicitly used the ax register.
Counter: This register is used in loop and rep - so it is used for counting.
Base: The bx register could be used to address memory (like mov ax, [bx+40]) while x86 CPUs are not able to use registers ax, cx or dx to address memory (unlike the 32-bit variants eax, ecx and edx - but 8086 did not have 32-bit registers). So bx may hold the base address of some data structure in memory and you can access the data in the structure.

来源：https://stackoverflow.com/questions/12770378/why-did-they-use-numbers-for-register-names-in-x86-64

标签

assembly

x86-64

cpu-registers

isa