x86 OpCode Instruction Decoding

问题

I've been looking into the Software Developer's Manual of the x86 architecture trying to brush my reverse engineering skills. I know that the architecture is complex and backwards compatible with previous generations. With newer generations some of the older documentation parts get left out. But one of the disturbing explanations and misinterpretations were instructions similar to this one:

80 /2 ib

So instructions based on the following 80 OpCode would be followed by a MOD/RM/REG byte. One of the side affects of disregarding old content, I had no clue about different versions MOD/RM/REG byte. But I always assumed their was a difference due to the architectural roots being 8/16-bit. Thankfully I stumbled on the dissection of the MOD/RM/REG byte during the original introduction of the architecture.

So instructions as the one provided above followed by the forward slash digit says that the Octal value would be contained within the RM offset of the MOD/RM/REG byte would be a value of 2.

My actual question(s) are the following:

Does the MOD offset in MOD/RM/REG byte accept all addressing modes in the current condition or are there any imposed restrictions? The other thing does anybody have a clue why the digit is specified with a /2? Would that be a reason to assume that lower values were used in older generations of the ISA and thus are preserved for backwards compatibility.

回答1:

You should have read CHAPTER 2 INSTRUCTION FORMAT in the manual. As a brief summary, the /digit notation uses the reg/opcode field of the modr/m byte as an opcode extension of the given value. The manual says: The reg/opcode field specifies either a register number or three more bits of opcode information.. See also the Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte.

The opcode extension is used when there is no second register operand, such as for immediates, as in your example which is ADC r/m8, imm8. Other instructions exist with main opcode 80, but different extension. You can look in Table A-6. Opcode Extensions for One- and Two-byte Opcodes by Group Number, and see that the opcode extensions from 0 to 7 correspond to ADD, OR, ADC, SBB, AND, SUB, XOR, CMP, respectively.

Also note that modr/m and thus this encoding scheme is used in 32 and 64 bit code too, so it's not something obsolete. For example, ADC [eax], 0x42 has machine code 80 10 42 , where 80 is the main opcode, 10 is the modr/m with a 2 in the reg field and specifying [eax], and of course 42 is the immediate.

回答2:

Instruction Prefix                0 or 1 Byte
Address-Size Prefix               0 or 1 Byte
Operand-Size Prefix               0 or 1 Byte
Segment Prefix                    0 or 1 Byte
Opcode                            1 or 2 Byte
Mod R/M                           0 or 1 Byte
SIB, Scale Index Base (386+)      0 or 1 Byte
Displacement                      0, 1, 2 or 4 Byte (4 only 386+)
Immediate                         0, 1, 2 or 4 Byte (4 only 386+)

Format of Postbyte(Mod R/M byte from Intel-manual)
--------------------------------------------------
MM RRR MMM

MM  - Memory addressing mode
RRR - Register operand address
MMM - Memory operand address

RRR Register Names
Filds  8bit  16bit  32bit
000    AL     AX     EAX
001    CL     CX     ECX
010    DL     DX     EDX
011    Bl     BX     EBX
100    AH     SP     ESP
101    CH     BP     EBP
110    DH     SI     ESI
111    BH     DI     EDI

---

16bit memory (No 32 bit memory address prefix)
MMM   Default MM Field
Field Sreg     00        01          10             11=MMM is reg
000   DS       [BX+SI]   [BX+SI+o8]  [BX+SI+o16]
001   DS       [BX+DI]   [BX+DI+o8]  [BX+DI+o16]
010   SS       [BP+SI]   [BP+SI+o8]  [BP+SI+o16]
011   SS       [BP+DI]   [BP+DI+o8]  [BP+DI+o16]
100   DS       [SI]      [SI+o8]     [SI+o16]
101   DS       [DI]      [DI+o8]     [SI+o16]
110   SS       [o16]     [BP+o8]     [BP+o16]
111   DS       [BX]      [BX+o8]     [BX+o16]
Note: MMM=110,MM=0 Default Sreg is DS !!!!

32bit memory (Has 67h 32 bit memory address prefix)
MMM   Default MM Field
Field Sreg     00        01          10             11=MMM is reg
000   DS       [EAX]     [EAX+o8]    [EAX+o32]
001   DS       [ECX]     [ECX+o8]    [ECX+o32]
010   DS       [EDX]     [EDX+o8]    [EDX+o32]
011   DS       [EBX]     [EBX+o8]    [EBX+o32]
100   SIB      [SIB]     [SIB+o8]    [SIB+o32]
101   SS       [o32]     [EBP+o8]    [EBP+o32]
110   DS       [ESI]     [ESI+o8]    [ESI+o32]
111   DS       [EDI]     [EDI+o8]    [EDI+o32]
Note: MMM=110,MM=0 Default Sreg is DS !!!!

---

SIB is (Scale/Base/Index)
SS BBB III
Note: SIB address calculated as:
<sib address>=<Base>+<Index>*(2^(Scale))

Fild   Default Base
BBB    Sreg    Register   Note
000    DS      EAX
001    DS      ECX
010    DS      EDX
011    DS      EBX
100    SS      ESP
101    DS      o32        if MM=00 (Postbyte)
SS      EBP        if MM<>00 (Postbyte)
110    SS      ESI
111    DS      EDI

Fild  Index
III   register   Note
000   EAX
001   ECX
010   EDX
011   EBX
100              never Index SS can be 00
101   EBP
110   ESI
111   EDI

Fild Scale coefficient
SS   =2^(SS)
00   1
01   2
10   4
11   8

来源：https://stackoverflow.com/questions/26607462/x86-opcode-instruction-decoding

标签

assembly

x86

reverse-engineering

disassembly

isa