Instruction Decoding in x86 architecture [closed]

问题

I am working on a operating system project for my lab where I've to work with the instruction pointer and instruction opcode. Right now all I need to know is what type of instruction it is. For that I'm reading the data from the address pointed by instruction pointer. The first byte from this data gives me the instruction type. For example if first byte is 0xC6 it is a MOVB instruction. Now there are some cases when the first byte of instruction pointer is 0x0F. According to documentation 0x0F which means it is a two byte instruction. My problem is with this type of instruction. I'm not sure how to find out the instruction type for two byte instruction.

After that my 2nd priority is two find out the operands of the instruction. I've no knowledge of doing that from code. Any sample code will be appreciated

Third comes the need to find out the size of the instruction. As x86 is variable length, I want to know the size of each instructions. At first I planned to use a look up table where I'll maintain the instruction name and its size. But then I discovered that the same instruction can have variable length. For example when I used object dump on a .o file I found two instruction C6 00 62 which is for MOVB $0x62,(%EAX) & C6 85 2C FF FF FF 00 which is for MOVB $0x0,-0xD4(%EBP). Look here both instruction type is same(C6) but the are of different length.

So I'm in need of answers to those questions. It'll be highly appreciated if someone can give me some solutions.

回答1:

Basically what you need is set of nested case statements, implementing a finite state machine scanner, where each level inspects some byte (typically left to right) of the opcode to determine what it does.

Your top level case statement will pretty much be 256 cases, one for each opcode byte; you'll find some of the opcodes (especially the so-called "prefix" bytes) cause the top level to loop (picking up multiple prefix bytes the precede main opcode byte). Sub cases will acquire structure according the opcode structure of the x86; you'll almost certainly end up with a MODRM and SIB addressing mode byte decoders/subroutines.

I've done this; the work is annoying because of details but not hard. You can get a pretty good solution in several hundred lines of code if you are careful. If you insist on doing the whole instruction set (vector registers and opcodes, esp. for haswell etc.) you're likely to end up with something bigger; Intel has been jamming instructions into every dark corner they can find.

You really need an opcode map; I'm pretty sure there is one in the Intel manuals. I've found this link to be pretty useful: http://www.ref.x86asm.net/coder32.html

EDIT Sept 2015: Here at SO I provide C code that implements this: https://stackoverflow.com/a/23843450/120163

回答2:

Additional approach is to actually build a proper parser for the assembly, using one of the many parser generator frameworks (such as an ubiquitous yacc). This may result in easier to maintain and mode readable implementation than using nested switch statements with large amount of cases.

There's also an intermediate approach, whereupon table based parser can be implemented "by hand". One example is here: https://github.com/libcpu/libcpu/blob/master/arch/x86/x86_decode.cpp

回答3:

kvm has a very sophisticated x86 emulator / decoder that may be reusable by your project.

来源：https://stackoverflow.com/questions/20319704/instruction-decoding-in-x86-architecture

标签

c++

assembly

x86

decode