What is the actual relation between assembly, machine code, bytecode, and opcode?
I have read most of the SO questions about assembly and machine code, such as this,
Yes, each architecture has an instruction set reference that gives how instructions are encoded. For x86, it's the Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2 (2A, 2B & 2C): Instruction Set Reference, A-Z
Most assemblers, including nasm, can produce a listing file for you. Feeding your sample code to nasm -l, we get:
1 global main
2 section .text
3
4 main:
5 00000000 E800000000 call write
6
7 write:
8 00000005 B804000002 mov rax, 0x2000004
9 0000000A BF01000000 mov rdi, 1
10 0000000F 48BE- mov rsi, message
11 00000011 [0000000000000000]
12 00000019 BA0E000000 mov rdx, length
13 0000001E 0F05 syscall
14
15 section .data
16 00000000 48656C6C6F2C20776F- message: db 'Hello, world!', 0xa
17 00000009 726C64210A
18 length: equ $ - message
You can see the generated machine code in the third column (first is line number, second is address).
Note that the output of the assembler is an object file, and the output of the linker is an executable. Both of those have a complex structure and contain more than just the machine code. This is why your hexdump differs from the above listing.
Opcode is generally considered to be the part of the machine code instruction that specifies the operation to perform. For example, in the above code you have B804000002 mov rax, 0x2000004. There B8 is the opcode, 04000002 is the immediate operand.
Bytecode is not typically used in the assembly context, it could be thought of as the machine code for a virtual machine.
For a walkthrough, x86 is a very complicated architecture. But your sample code happens to have a simple instruction, the syscall. So let's see how to turn that into machine code. Open the above mentioned reference pdf, and go to the section about syscall in chapter 4. You will immediately see it listed as opcode 0F 05. Since it doesn't take any operands, we are done, those 2 bytes are the machine code. How do we turn it back? Go to Appendix A: Opcode map. Section A.1 tells us: For 2-byte opcodes beginning with 0FH (Table A-3), skip any instruction prefixes, the 0FH byte (0FH may be preceded by 66H, F2H, or F3H) and use the upper and lower 4-bit values of the next opcode byte to index table rows and columns.. Okay so we skip the 0F and split the 05 into 0 and 5 and look that up in table A-3 in row #0, column #5. We find it is a syscall instruction.