How to execute a call instruction with a 64-bit absolute address?

旧城冷巷雨未停 提交于 2020-06-08 06:14:05

问题


I am trying to call a function - that should have an absolute address when compiled and linked - from machine code. I am creating a function pointer to the desired function and trying to pass that to the call instruction, but I noticed that the call instruction takes at most a 16 or 32-bit address. Is there a way to call an absolute 64-bit address?

I am deploying for the x86-64 architecture and using NASM to generate the machine code.

I could work with a 32-bit address if I could be guaranteed that the executable would be for sure mapped to the bottom 4GB of memory, but I am not sure where I could find that information.

Edit: I cannot use the callf instruction, as that requires me to disable 64-bit mode.

Second Edit: I also do not want to store the address in a register and call the register, as this is performance critical, and I cannot have the overhead and performance hit of an indirect function call.

Final Edit: I was able to use the rel32 call instruction by ensuring that my machine code was mapping to the first 2GB of memory. This was achieved through mmap with the MAP_32BIT flag (I'm using linux):

MAP_32BIT (since Linux 2.4.20, 2.6) Put the mapping into the first 2 Gigabytes of the process address space. This flag is supported only on x86-64, for 64-bit programs. It was added to allow thread stacks to be allocated somewhere in the first 2GB of memory, so as to improve context- switch performance on some early 64-bit processors. Modern x86-64 processors no longer have this per‐ formance problem, so use of this flag is not required on those systems. The MAP_32BIT flag is ignored when MAP_FIXED is set.


回答1:


related: Handling calls to (potentially) far away ahead-of-time compiled functions from JITed code has more about JITing, especially allocating your JIT buffer near the code it wants to call, so you can use efficient call rel32. Or what to do if not.

Also Call an absolute pointer in x86 machine code is a good canonical Q&A about call or jmp to an absolute address.


TL:DR: To call a function by name, just use call func like a normal person and let the assembler + linker take care of it. Since you say you're using NASM, I guess you're actually generating the machine code with an assembler. It sounded like a more complicated question, but I think you were just trying to ask if the normal way was safe.


Indirect call r/m64 (FF /2) takes a 64-bit register or memory operand in 64-bit mode.

So you can do

func equ  0x123456789ab
; or if func is a regular label

mov   rax, func          ; mov r64, imm64,  or mov r32, imm32 if it fits
call  rax

Normally you'd put a label address into a register with lea rax, [rel func], but if that's encodeable then you'd just use call rel32.


Or, if you know what address your machine code will be stored in, you can use the normal direct call rel32 encoding, after you calculate the difference in address from the target to the end of the call instruction.

If you don't want to use an indirect call, then the rel32 encoding is your only option. Make sure your machine code goes into the low 2GiB so it can reach any address in the low 4GiB.


if I could be guaranteed that the executable would be for sure mapped to the bottom 4GB of memory

Yes, this is the default code model for Linux, Windows, and OS X. AMD64 call / jump instructions, and RIP-relative addressing, only use rel32 encodings, so all systems default to the "small" code model where code and static data are in the low 2GiB, so it's guaranteed that the linker can just fill in a rel32 to reach up to 2G forward or 2G backward.

The x86-64 System V ABI does discuss Large / Huge code models, but IDK if anyone ever uses that, because of the inefficiency of addressing data and making calls.


re: efficiency: yes, mov / call rax is less efficient. I think it's significantly slower if branch prediction misses and can't provide a target prediction from the BTB. However, even call rel32 and jmp rel32 still need the BTB for full performance. See Slow jmp-instruction for experimental results from relative jmp next_insn slowing down when there are too many in a giant loop.

With hot branch predictors, the indirect version is only extra code size and an extra uop (the mov). It might consume more prediction resources, but maybe not even that.

See also What branch misprediction does the Branch Target Buffer detect?



来源:https://stackoverflow.com/questions/38961192/how-to-execute-a-call-instruction-with-a-64-bit-absolute-address

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!