New AVX-instructions syntax

百般思念 提交于 2019-12-01 18:43:20
Z boson

Is it always better to use new AVX syntax if possible?

I think the first question is to ask if folder instructions are better than a non-folder instruction pair. Folding takes a pair of read and modify instructions like this

vmovdqa %xmm0, %xmm2
vpunpckhbw %xmm2, %xmm1, %xmm1

and "folds" them into one combined instruction

vpunpckhbw  %xmm0, %xmm1, %xmm2

Since Ivy Bridge a register to register move instruction can have zero latency and can use zero execution ports. However, the unfolded instruction pair still counts as two instructions on the front-end and therefore can affect the overall throughput. The folded instruction however only counts as one instruction in the front-end which lowers the pressure on the front-end without any side effects. This could increase the overall throughput.

However, for memory to register moves the folding can may have a side effect (there is currently some debate about this) even if it lowers pressure on the front-end. The reason is that the out-of-order engine from the front-ends point of view only sees a folded instruction (assuming this answer is correct) and if for some reason it would be more optimal to reorder the memory read operation (since it does require execution ports and has latency) independently from the other operations in the folded instruction the out-of-order engine won't be able to take advantage of this. I observed this for the first time here.

For your particular operation the AVX syntax is always better since it folds the register to register move. However, if you had a memory to register move the folder AVX instruction could perform worse than the unfolded SSE instruction pair in some cases.


Note that, in general, it should still be better to use a vex-encoded instructions. But I think most compilers, if not all, now assume folding is always better so you have no way to control the folding except with assembly (not even with intrinsics) or in some cases by telling the compiler not to compile with AVX.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!