Absolute addressing for runtime code replacement in x86_64

有些话、适合烂在心里 提交于 2019-12-05 03:41:45

Try using the large code model for x86_64. In gcc this can be selected with -mcmodel=large. The compiler will use 64 bit absolute addressing for both code and data.

You could also add -fno-pic to disallow the generation of position independent code.

Edit: I built a small test app with -mcmodel=large and the resulting binary contains sequences like

400b81:       48 b9 f0 30 60 00 00    movabs $0x6030f0,%rcx
400b88:       00 00 00 
400b8b:       49 b9 d0 09 40 00 00    movabs $0x4009d0,%r9
400b92:       00 00 00 
400b95:       48 8b 39                mov    (%rcx),%rdi
400b98:       41 ff d1                callq  *%r9

which is a load of an absolute 64 bit immediate (in this case an address) followed by an indirect call or an indirect load. The instruction sequence

moveabs $variable, %rbx
addq %rax, %rbx

is the equivalent to a "leaq offset64bit(%rax), %rbx" (which doesn't exist), with some side effects like flag changing etc.

What you're asking about is doable, but not very easy.

One way to do it is compensate for the code move in its instructions. You need to find all the instructions that use the RIP-relative addressing (they have the ModRM byte of 05h, 0dh, 15h, 1dh, 25h, 2dh, 35h or 3dh) and adjust their disp32 field by the amount of move (the move is therefore limited to +/- 2GB in the virtual address space, which may not be guaranteed given the 64-bit address space is bigger than 4GB).

You can also replace those instructions with their equivalents, most likely replacing every original instruction with more than one, for example:

; These replace the original instruction and occupy exactly as many bytes as the original instruction:
  JMP Equivalent1
  NOP
  NOP
Equivalent1End:

; This is the code equivalent to the original instruction:
Equivalent1:
  Equivalent subinstruction 1
  Equivalent subinstruction 2
  ...
  JMP Equivalent1End

Both methods will require at least some rudimentary x86 disassembly routines.

The former may require the use of VirtualAlloc() on Windows (or some equivalent on Linux) to ensure the memory that contains the patched copy of the original code is within +/- 2GB of that original code. And allocation at specific addresses can still fail.

The latter will require more than just primitive disassemblying, but also full instruction decoding and generation.

There may be other quirks to work around.

Instruction boundaries may also be found by setting the TF flag in the RFLAGS register to make the CPU generate the single-step debug interrupt at the end of execution of every instruction. A debug exception handler will need to catch those and record the value of RIP of the next instruction. I believe this can be done using Structured Exception Handling (SEH) in Windows (never tried with the debug interrupts), not sure about Linux. For this to work you'll have to make all of the code execute, every instruction.

Btw, there's absolute addressing in 64-bit mode, see, for example the MOV to/from accumulator instructions with opcodes from 0A0h through 0A3h.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!