LLVM Backend : Replacing indirect jmps for x86 backend

让人想犯罪 __ 提交于 2019-12-05 22:12:43

I am done with my project. Posting my approach for the benefit of others.

The main function of LLVM backend is to convert the Intermediate Representation to the final executable depending on the target architecture and other specification. The LLVM backend itself consists of several phases which does target specific optimization,Instruction Selection, Scheduling and Instruction Emitting. These phases are required because the IR is a very generic representation and requires a lot of modifications to finally convert them to target specific executables.

1)Logging every time the compiler generates jmp *(eax)

We can achieve this by adding print statements to the Instruction Emitting/Printing phase. After most of the main conversion from IR is done, there is an AsmPrinter pass which goes through each Machine Instruction in a Basic Block of every function. This main loop is at lib/CodeGen/AsmPrinter/AsmPrinter.cpp:AsmPrinter::EmitFunctionBody(). There are other related functions like EmitFunctionEpilogue,EmitFunctionPrologue. These functions finally call EmitInstruction for specific architecture eg: lib/Target/X86/X86AsmPrinter.cpp. If you tinker around a bit, you can call MI.getOpcode() and compare it with defined enums for the architecture to print a log.

For example for a jump using register in X86, it is X86::JMP64r. You can get the register associated using MI.getOperand(0) etc.

if(MI->getOpcode() == X86::JMP64r)
dbgs() << "Found jmp *x instruction\n";

2)Replacing the instruction The required changes vary depending on the type of replacement you require. If you need more context about registers,or previous instructions, we would need to implement the changes higher up in the Pass chain. There is a representation of instructions called Selection DAG( directed acyclic graph ) which stores dependencies of each instruction to previous instructions. For example, in the sequence

mov myvalue,%rax
jmp *rax

The DAG would have the jmp instruction pointing to the move instruction ( and possibly other nodes before it) since the value of rax depends on the mov instruction. You can replace the Node here with your required Nodes. If done correctly, it should finally change the final instructions. The SelectionDAG code is at lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp. Always best to poke around first to figure out the ideal place to change. Each IR statement goes through multiple changes before the DAG is topologically sorted so that the Instructions are in a linear sequence. The graphs can be viewed using -view-dag* options seen in llc --help-hidden. In my case, I just added a specific check in EmitInstruction and added code to Emit two instructions that i wanted.

LLVM documentation is always there, but i found Eli Bendersky's two articles more helpful than any other resources. Life of LLVM Instruction and Deeper look into LLVM Code Generation. The articles discuss the very complex TableGen descriptions and the instruction matching process as well which is kind of cool if you are interested.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!