Bypass delays when switching execution unit domains

末鹿安然 提交于 2019-12-04 04:21:45
Leeor

Section 2.1.4 in the Intel optimization guide indicates that you (and Agner) are quite right on this matter -

When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a one- or two-cycle delay can occur. The delay occurs also for tran-sitions between Intel SSE integer and Intel SSE floating-point operation.

So in general it seems you'd be better off keeping within the same stack/domain as much as possible.

Of course benchmarking is always preferred, and all this is worth handling only in case this is indeed a bottleneck in your code.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!