Translation of machinecode into LLVM IR (disassembly / reassembly of X86_64. X86. ARM into LLVM bitcode)

元气小坏坏 提交于 2019-11-26 11:57:03

问题


I would like to translate X86_64, x86, ARM executables into LLVM IR (disassembly).

What solution do you suggest ?


回答1:


mcsema is a production-quality binary lifter. It takes x86 and x86-64 and statically "lifts" it to LLVM IR. It's actively maintained, BSD licensed, and has extensive tests and documentation.

https://github.com/trailofbits/mcsema




回答2:


Consider using RevGen tool developed within the S2E project. It allows converting x86 binaries to LLVM IR. The source code could be checked out from Revgen branch of GIT repository available by url https://dslabgit.epfl.ch/git/s2e/s2e.git.




回答3:


As regards to RevGen tool mentioned by @bsa2000, this latest paper "A compiler level intermediate representation based binary analysis and rewriting system" has pointed out some limitations in S2E and Revinc.

I pull them out here.

  1. shortcoming of dynamic translation:

    S2E [16] and Revnic [14] present a method for dynamically translating x86 to LLVM using QEMU. Unlike our approach, these methods convert blocks of code to LLVM on the fly which limits the application of LLVM analyses to only one block at a time.

  2. IR incomplete:

    Revnic [14] and RevGen [15] recover an IR by merging the translated blocks, but the recovered IR is incomplete and is only valid for current execution; consequently, various whole program analyses will provide incomplete information.

  3. no abstract stack or promoting information

    Further, the translated code retains all the assumptions of the original bi- nary about the stack layout. They do not provide any methods for obtaining an abstract stack or promoting memory locations to symbols, which are essential for the application of several source-level analyses.




回答4:


I doubt there will be universal solution (think about indirect branches, etc.), LLVM IR is much "higher level" than any assembler. Though it's possible to translate on per-BB basis. You might want to check llvm-qemu and libcpu projects among others.




回答5:


There is new project, being in some early phases, The libbeauty: https://github.com/jcdutton/libbeauty

Article about project: Libbeauty: Another Reverse-Engineering Tool, 24 December 2013, Michael Larabel - http://www.phoronix.com/scan.php?page=news_item&px=MTU1MTU

It only supports subset of x86_64 as input now. One of the project goals - is to be able to compile the generated LLVM IR back to assembly to get the binary with same functionality.




回答6:


Just post some references on translating ARM binary to LLVM IR:

disarm - arm binary to llvm ir disassembler

https://code.google.com/p/disarm/

However, I have not tried it, thus not sure about its quality and stability. Anyone else may post additional information about this project?



来源:https://stackoverflow.com/questions/6981810/translation-of-machinecode-into-llvm-ir-disassembly-reassembly-of-x86-64-x86

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!