What is the actual relation between assembly, machine code, bytecode, and opcode?

前端 未结 6 889

What is the actual relation between assembly, machine code, bytecode, and opcode?

I have read most of the SO questions about assembly and machine code, such as this,

6条回答
  •  长情又很酷
    2020-12-29 12:18

    Yes, each architecture has an instruction set reference that gives how instructions are encoded. For x86, it's the Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2 (2A, 2B & 2C): Instruction Set Reference, A-Z

    Most assemblers, including nasm, can produce a listing file for you. Feeding your sample code to nasm -l, we get:

     1                                  global main
     2                                  section .text
     3
     4                                  main:
     5 00000000 E800000000                call write
     6
     7                                  write:
     8 00000005 B804000002                mov rax, 0x2000004
     9 0000000A BF01000000                mov rdi, 1
    10 0000000F 48BE-                     mov rsi, message
    11 00000011 [0000000000000000]
    12 00000019 BA0E000000                mov rdx, length
    13 0000001E 0F05                      syscall
    14
    15                                  section .data
    16 00000000 48656C6C6F2C20776F-     message: db 'Hello, world!', 0xa
    17 00000009 726C64210A
    18                                  length: equ $ - message
    

    You can see the generated machine code in the third column (first is line number, second is address).

    Note that the output of the assembler is an object file, and the output of the linker is an executable. Both of those have a complex structure and contain more than just the machine code. This is why your hexdump differs from the above listing.

    Opcode is generally considered to be the part of the machine code instruction that specifies the operation to perform. For example, in the above code you have B804000002 mov rax, 0x2000004. There B8 is the opcode, 04000002 is the immediate operand.

    Bytecode is not typically used in the assembly context, it could be thought of as the machine code for a virtual machine.


    For a walkthrough, x86 is a very complicated architecture. But your sample code happens to have a simple instruction, the syscall. So let's see how to turn that into machine code. Open the above mentioned reference pdf, and go to the section about syscall in chapter 4. You will immediately see it listed as opcode 0F 05. Since it doesn't take any operands, we are done, those 2 bytes are the machine code. How do we turn it back? Go to Appendix A: Opcode map. Section A.1 tells us: For 2-byte opcodes beginning with 0FH (Table A-3), skip any instruction prefixes, the 0FH byte (0FH may be preceded by 66H, F2H, or F3H) and use the upper and lower 4-bit values of the next opcode byte to index table rows and columns.. Okay so we skip the 0F and split the 05 into 0 and 5 and look that up in table A-3 in row #0, column #5. We find it is a syscall instruction.

提交回复
热议问题