问题
I've got a project involving emulation (If you look at my post history, you'll see how far I've come!) and I'm looking to do pseudo-binary-translation using C and playing with the optimizers and/or compilers to use C code that compiles my switch statement contents to a single assembly instruction, primarily for very standard instructions such as movs, add, SR and other simple bit manipulations and arithmetic instructions. I'm hoping to do this for ARM and x86-64 at the same time, writing as little of it in both assemblies as possible.
If the thing I'm describing doesn't exist, then I wonder if there's some sort of "assembly language" that I can use to write my code and then compile that assembly into x86-64 and ARM.
回答1:
To clearly answer this part:
... then I wonder if there's some sort of "assembly language" that I can use to write my code and then compile that assembly into x86-64 and ARM.
That's exactly what LLVM IR is targetting.
The LLVM representation aims to be light-weight and low-level while being expressive, typed, and extensible at the same time. It aims to be a “universal IR” of sorts, by being at a low enough level that high-level ideas may be cleanly mapped to it (similar to how microprocessors are “universal IR’s”, allowing many source languages to be mapped to them).
For example:
You can represent this C function
int mul_add(int x, int y, int z) {
return x * y + z;
}
with this LLVM IR
define i32 @mul_add(i32 %x, i32 %y, i32 %z) {
entry:
%tmp = mul i32 %x, %y
%tmp2 = add i32 %tmp, %z
ret i32 %tmp2
}
回答2:
To say it in a pointy fashion, the "assembly language" you're talking about is ... C.
That's because a lot of C expressions have direct mappings to single assembly instructions even on different platforms. The following is partially-hypothetical but it shows some of the instructions a certain C expression may evaluate to on x86, ARM or SPARC (choosing those three because those are the ones I know best):
C code x86 asm ARM asm SPARC asm
{ enter push lr save %fp, ..., %sp
} leave pop pc restore
a += b; add %ebx, %eax add R0, R1 add %l0, %l1, %l0
a = b + c; lea (%ebx, %ecx), %eax add R0, R1, R2 add %l2, %l1, %l0
a = 0; xor %eax, %eax mov R0, #0 clr %l0
a++; inc %eax add R0, #1 inc %l0
a--; dec %eax sub R0, #1 dec %l0
*ptr++; inc (%eax) - -
a = ~b; mov %ebx, %eax; not %eax mvn R0, R1 not %l1, %l0
ptr = &a; lea a, %eax ldr R0, =a set a, %l0
a = b[c]; mov (%ebx, %ecx), %eax ldr R0, [R1+R2] ld [%l1+%l2], %l0
(void)func(); call func blx func call func
if (a) test %eax, %eax; jnz tst R0, R0; bnz tst %l0; bnz
Of course not everything you can write as one line of C code will transform to a single assembly instruction. It also depends strongly on the instruction set if certain multi-term operations can be "flattened" to a single multi-operand assembly instruction or require a sequence of "more primitive" instructions.
C compilers have, for a long time, done "intermediate representation" before the final convert-to-assembly; the step is similar to that done these days in hardware by x86 CPUs to "compile" the x86 assembly into lower-level micro-ops that the chip's actual execution units will process. That the intermediate layer got codified / documented as has happened for LLVM IR is not that new either ... since e.g. Java Bytecode or Forth conceptually fits that schema.
I'd go for C ... and look at the assembly output. It's not unlikely to be as-compact-as-can-be already, and on platforms where the corresponding "compound" operation is available, not unlikely to be more compact than LLVM IR (say, on a cpu with fused-multiply-add, the example auselen gave will go down to a single instruction, from three in LLVM IR).
回答3:
If you want to emit machine code at run time, you need some Just In Time translation library. You might consider GNU lightning, libjit, LLVM, GCCJIT, asmjit ...
You could also (on Linux) generate some C code in some file, fork a compilation of that file into a shared object, then dlopen(3)-ing that .so plugin...
As I commented: cross-platform assembly does not exist and cannot exist (because systems have different instruction sets and ABI conventions): consider instead generating C code, or perhaps LLVM IR code.
If you are writing some interpreter (and that includes many emulators), consider also threaded code techniques and bytecode generation.
来源:https://stackoverflow.com/questions/25484505/are-there-c-functions-or-macros-specifically-designed-to-compile-1-to-1-with-ass