How could I generate and execute machine code at runtime?

问题

The closest I have gotten to assembly is building my own Java Class library which loads class files and allows you to create, compile, and decompile classes. While endeavoring this project, I wondered how the Java Virtual Machine actually generated native machine code at runtime during JIT optimizations.

It got me thinking: how could one generate machine code and execute it at runtime with assembly, and as a bonus, without a JIT compiler library, or "manually"?

回答1:

Your question changed substantially (in july 2017). The initial variant referred to the EX (execute) instruction of IBM mainframes.

how could one generate machine code and execute it at runtime with assembly...?

In practice, you would use some JIT compilation library, and there are many of them. Or you would use some dynamic loader. At the lowest level, they all write some byte sequences representing valid machine code—a sequence of many machine instructions—in a memory segment (of your virtual address space) which has to be made executable (read about the NX bit), and then some of your code would jump indirectly to that address or more often call it indirectly—that is call through a function pointer. Most JVM implementations use JIT compilation techniques.

...and as a bonus, without a JIT compiler library, or "manually"?

Supposing you have some valid machine code for the processor architecture that your program is currently executing on, for example, you could get a memory segment (e.g. mmap(2) on Linux), and then make it executable (e.g. mprotect(2)). Most other operating systems provide similar system calls.

If you use a JIT compilation library like asmjit or libjit or libgccjit or LLVM or many others, you first construct in memory a representation (similar to some abstract syntax tree) of the code to be generated, then ask the JIT library to emit machine code for it. You could even write your own JIT compilation code, but it is a lot of work (you need to understand all the details of your instruction set, e.g. x86 for PCs). By the way, generating fast-running machine code is really difficult, because you need to optimize like compilers do (and to care about details like instruction scheduling, register allocation, etc... see also this), and that is why using an existing JIT compilation library (like libgccjit or LLVM) is preferable (a contrario, simpler JIT libraries like asmjit or libjit or GNU lightning don't optimize much and generate poor machine code).

If you use a dynamic loader (e.g. dlopen(3) on POSIX) you would use some external compiler to produce a shared library (that is a plugin) and then you ask the dynamic linker to load it in your process (and handle appropriate relocations) and get by name (using dlsym(3)) some function addresses from it.

Some language implementations (notably SBCL for Common Lisp) are able to emit on the fly some good machine code at every REPL interaction. In essence their runtime embark a full compiler (containing a JIT compilation part).

A trick I often use on Linux is to emit some C (or C++) code at runtime in some temporary file (that is compiling some domain specific language to C or to C++), fork a compilation of it as a plugin, and dynamically load it. With current (laptops, desktops, servers) computers it is fast enough to stay compatible with an interactive loop.

Read also about eval (in particular the famous SICP book), metaprogramming, multistage programming, self-modifying code, continuations, compilers (the Dragon Book), Scott's Programming Language Pragmatics, and J.Pitrat's blog.

回答2:

In the comments I gave you a link to a file explaining things thoroughly.

Most Assembly languages have a subroutine (the assembly word for function as far as your googling is concerned) implementation as two commands call and ret - maybe something similar.

The implementation is nearly the same as a jump, excepts call stores in the stack the address of the next command, and ret pops it - that's why it's very important to maintain a balanced stack in the subroutine. Since you don't want to mess with registers which may contain important stuff/are limited, this is where you keep all your local variables, and hence balancing is an issue. You could of course do this yourself with jump and some pushing and popping.

As far as "arguments" are concerned, a simple method is using registers. This is a problem if you need to pass more arguments than there are registers. A more robust method is pushing the arguments before the call. This is what many real 32-bit calling-conventions do. An example from the link I provided for a subroutine adding 3 numbers:

# Save old EBP
pushl %ebp
# Change EBP
movl %esp, %ebp
# Save caller-save registers if necessary
pushl %ebx
pushl %esi
pushl %edi
# Allocate space for local variable
subl $4, %esp
# Perform the addition
movl 8(%ebp), %eax
addl 12(%ebp), %eax
addl 16(%ebp), %eax
movl %eax, -16(%ebp)
# Copy the return value to EAX
movl -16(%ebp), %eax
# Restore callee-save registers if necessary
movl -12(%ebp), %edi
movl -8(%ebp), %esi
movl -4(%ebp), %ebx
# Restore ESP
movl %ebp, %esp
# Restore EBP
popl %ebp
# Return to calling
ret

Calling the subroutine:

# Save caller-save registers if necessary
pushl %eax
pushl %ecx
pushl %edx
# Push parameters
pushl $5
pushl $4
pushl $3
# Call add3
call add3
# Pop parameters
addl %12, %esp
# Save return value
movl %eax, wherever
# Restore caller-save registers if necessary
popl %edx
popl %ecx
popl %eax
# Proceed!

As you can see you need more work here then high languages. The pdf contains a detailed explanation includes how the stack works, but note that:

You need to define how to handler register usage. In this example both the caller and the subroutine save the registers, just in case - you can of course simplify.
Arguments and local variables are addressed relative to the stack pointer, locals positive, arguments negative.
If this is a small thing you're making for yourself you can skip all this stack playing and just set aside registers for argument and return value transferring, maybe to practice before you go to more advance stuff.

回答3:

To execute a piece of x86 machine, use the jmp instruction to jump to its beginning. Note that the CPU doesn't know where the code ends so you have to make manual arrangements. A better way is to use call to call that machine code and then return with a ret instruction somewhere in the code.

There is no direct way to execute just a single instruction as that is usually pretty pointless. I'm not sure what you are trying to achieve.

来源：https://stackoverflow.com/questions/43255053/how-could-i-generate-and-execute-machine-code-at-runtime

标签

language-agnostic

code-generation

execution

instructions