Dummy operations handling of Intel processor

自古美人都是妖i 提交于 2019-12-01 09:19:16

问题


Admittedly, I have a bit silly question. Basically, I am wondering if there are some special mechanisms provided by Intel processors to efficiently execute a series of dummy, i.e., NOP instructions? For instance,I could imagine there could be some kind of pre-fetch mechanism that identifies NOPS, discards them and tries to fetch some useful instructions instead. Or are these NOPS dispatched to the execution unit as normal instructions, meaning that i can roughly process 5 nops each cycle (under the assumption that there are 5 execution units)

Thanks, Reinhard


回答1:


Discarding them would be pretty bad idea: they are often used for busy-waiting. If you discard NOPs, you make your wait-loop much tighter than it should be and potentially introduce considerable communications overhead.

If you feel that NOPs are inefficient, you could try HLT which saves some energy. Or you could even send the CPU into a sleep state. However, these only make sense if you want to "do nothing" for a considerable amount of time and they usually require suvervisor privileges.




回答2:


No. They are decoded and executed as normal instructions; there is hardware support to remove the false dependency that would otherwise be introduced on the EAX register for the single byte NOP, 0x90 (which is really xchg eax, eax), but that's all.

Reference: Intel(R) 64 and IA-32 Architectures Optimization Reference Manual - section 3.5.1.8, "Using NOPs".




回答3:


There's very little need for optimizing sequences of no-ops on the x86 architecture because it has no-op encodings of varying lengths. Instead of many one-byte no-ops, one can just use a single multi-byte no-op. Somewhat more work for the decoder, but the actual execution units only see a single instruction to execute.



来源:https://stackoverflow.com/questions/2123000/dummy-operations-handling-of-intel-processor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!