What does “rep; nop;” mean in x86 assembly? Is it the same as the “pause” instruction?

吃可爱长大的小学妹 提交于 2019-11-27 06:22:00
ughoavgfhw

rep; nop is indeed the same as the pause instruction (opcode F390). It might be used for assemblers which don't support the pause instruction yet. On previous processors, this simply did nothing, just like nop but in two bytes. On new processors which support hyperthreading, it is used as a hint to the processor that you are executing a spinloop to increase performance. From Intel's instruction reference:

Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.

Prefixes that don't apply to an instruction are ignored. However, future CPUs can use that byte sequence to encode a new instruction. (yes, the x86 opcode space is so limited that they do crazy stuff like this, and yes it makes the decoders complicated.)

In this case, it means you can use pause in spinloops without breaking backwards compat. Old CPUs that don't know about pause will decode it as a NOP with no harm done. On new CPUs, you get the benefit of power-saving / HT friendliness, and avoiding memory-ordering mis-speculation when the memory you're spinning on does change and you're leaving the spin loop.


Links to Intel's manuals and tons of other good stuff on the x86 tag wiki info page: https://stackoverflow.com/tags/x86/info

Another case of a meaningless rep prefix becoming a new instruction on new CPUs: lzcnt is F3 0F BD /r. On CPUs that don't support that instruction (missing the LZCNT feature flag in their CPUID), it decodes as rep bsr, which runs the same as bsr. So on old CPUs, it produces 32 - expected_result, and is undefined when the input was zero.


One case of a meaningless rep prefix that will probably never decode differently: rep ret is used by default by gcc when targeting "generic" CPUs (i.e. not targetting a specific CPU with -march or -mtune, and not targetting AMD K8 or K10.) It will be decades before anyone could make a CPU that decodes rep ret as anything other than ret, because it's present in most binaries in most Linux distros. See What does `rep ret` mean?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!