Are there C functions or macros specifically designed to compile 1 to 1 with assembly instructions for bit manipulations in a cross-platform manner?

被刻印的时光 ゝ 提交于 2019-12-12 16:24:45

问题


I've got a project involving emulation (If you look at my post history, you'll see how far I've come!) and I'm looking to do pseudo-binary-translation using C and playing with the optimizers and/or compilers to use C code that compiles my switch statement contents to a single assembly instruction, primarily for very standard instructions such as movs, add, SR and other simple bit manipulations and arithmetic instructions. I'm hoping to do this for ARM and x86-64 at the same time, writing as little of it in both assemblies as possible.

If the thing I'm describing doesn't exist, then I wonder if there's some sort of "assembly language" that I can use to write my code and then compile that assembly into x86-64 and ARM.


回答1:


To clearly answer this part:

... then I wonder if there's some sort of "assembly language" that I can use to write my code and then compile that assembly into x86-64 and ARM.

That's exactly what LLVM IR is targetting.

The LLVM representation aims to be light-weight and low-level while being expressive, typed, and extensible at the same time. It aims to be a “universal IR” of sorts, by being at a low enough level that high-level ideas may be cleanly mapped to it (similar to how microprocessors are “universal IR’s”, allowing many source languages to be mapped to them).

For example:

You can represent this C function

int mul_add(int x, int y, int z) {
  return x * y + z;
}

with this LLVM IR

define i32 @mul_add(i32 %x, i32 %y, i32 %z) {
entry:
  %tmp = mul i32 %x, %y
  %tmp2 = add i32 %tmp, %z
  ret i32 %tmp2
}



回答2:


To say it in a pointy fashion, the "assembly language" you're talking about is ... C.

That's because a lot of C expressions have direct mappings to single assembly instructions even on different platforms. The following is partially-hypothetical but it shows some of the instructions a certain C expression may evaluate to on x86, ARM or SPARC (choosing those three because those are the ones I know best):


    C code         x86 asm                   ARM asm          SPARC asm

    {              enter                     push lr          save %fp, ..., %sp
    }              leave                     pop pc           restore
    a += b;        add %ebx, %eax            add R0, R1       add %l0, %l1, %l0
    a = b + c;     lea (%ebx, %ecx), %eax    add R0, R1, R2   add %l2, %l1, %l0
    a = 0;         xor %eax, %eax            mov R0, #0       clr %l0
    a++;           inc %eax                  add R0, #1       inc %l0
    a--;           dec %eax                  sub R0, #1       dec %l0
    *ptr++;        inc (%eax)                -                -
    a = ~b;        mov %ebx, %eax; not %eax  mvn R0, R1       not %l1, %l0
    ptr = &a;      lea a, %eax               ldr R0, =a       set a, %l0
    a = b[c];      mov (%ebx, %ecx), %eax    ldr R0, [R1+R2]  ld [%l1+%l2], %l0
    (void)func();  call func                 blx func         call func
    if (a)         test %eax, %eax; jnz      tst R0, R0; bnz  tst %l0; bnz

Of course not everything you can write as one line of C code will transform to a single assembly instruction. It also depends strongly on the instruction set if certain multi-term operations can be "flattened" to a single multi-operand assembly instruction or require a sequence of "more primitive" instructions.

C compilers have, for a long time, done "intermediate representation" before the final convert-to-assembly; the step is similar to that done these days in hardware by x86 CPUs to "compile" the x86 assembly into lower-level micro-ops that the chip's actual execution units will process. That the intermediate layer got codified / documented as has happened for LLVM IR is not that new either ... since e.g. Java Bytecode or Forth conceptually fits that schema.

I'd go for C ... and look at the assembly output. It's not unlikely to be as-compact-as-can-be already, and on platforms where the corresponding "compound" operation is available, not unlikely to be more compact than LLVM IR (say, on a cpu with fused-multiply-add, the example auselen gave will go down to a single instruction, from three in LLVM IR).




回答3:


If you want to emit machine code at run time, you need some Just In Time translation library. You might consider GNU lightning, libjit, LLVM, GCCJIT, asmjit ...

You could also (on Linux) generate some C code in some file, fork a compilation of that file into a shared object, then dlopen(3)-ing that .so plugin...

As I commented: cross-platform assembly does not exist and cannot exist (because systems have different instruction sets and ABI conventions): consider instead generating C code, or perhaps LLVM IR code.

If you are writing some interpreter (and that includes many emulators), consider also threaded code techniques and bytecode generation.



来源:https://stackoverflow.com/questions/25484505/are-there-c-functions-or-macros-specifically-designed-to-compile-1-to-1-with-ass

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!