How can I write self-modifying code that runs efficiently on modern x64 processors?
问题 I'm trying to speed up a variable-bitwidth integer compression scheme and I'm interested in generating and executing assembly code on-the-fly. Currently a lot of time is spent on mispredicted indirect branches, and generating code based on the series of bitwidths as found seems to be the only way avoid this penalty. The general technique is referred to as "subroutine threading" (or "call threading", although this has other definitions as well). The goal is to take advantage of the processors