问题
I was going over some Assembly code and I saw this:
mov r12, _read_loopr
jmp _bzero
_read_loopr:
...
_bzero:
inc r8
mov byte [r8+r15], 0x0
cmp r8, 0xff
jle _bzero
jmp r12
And I was wondering if there was any particular advantage to doing this (mov _read_loopr to a register jmp to the function and then jmp back) rather than the usual call _bzero and ret?
回答1:
This just looks like braindead code, especially if the return-address label is always right after the jmp _bzero like you say in your comment.
Maybe the author thought that they couldn't use call "because function calls clobber registers". This what you have to assume based on the calling convention if you're calling a function that isn't part of the same codebase. But you can call/ret to functions with custom calling conventions.
Of course, for code this small, it should have been inlined (i.e. make it a macro, not a function).
More importantly, something more clever than storing one byte at a time is normally possible, and probably worth a potential branch mispredict if there are more than a few bytes to zero. If at least 8 (or better, 16) bytes of data always need to be zeroed, you can do it with wide stores. Make the final store write the the last byte of the buffer to be zeroed, potentially overlapping with the previous store. (This is much better than ending with branches to decide to do a final 4B store, 2B store, and 1B store.) See the x86 tag wiki for resources about writing efficient asm.
If the return address was somewhere other than right after the jmp _bzero, then the worst possible thing would probably be push _read_loopr / jmp _bzero, and ret in _bzero. That would break the return-address predictor stack, leading to a mispredict on the next ~15 rets up the call tree.
Best would be to inline the loop and put a direct jmp after it.
I'm not sure how passing an address for _bzero to jmp to would compare with a call/ret and then a jmp after the call.
call/ret are fairly cheap, but not single-uop instructions on Intel. A jmp _bzero / jmp _read_loopr would be better if there was only one caller.
来源:https://stackoverflow.com/questions/38542382/mov-jmp-to-jmp-back-vs-call-ret