Why do some SSE “mov” instructions specify that they move floating-point values?

↘锁芯ラ 提交于 2019-11-30 17:17:36
Josh Haberman

I think I've found the answer: some microarchitectures execute floating-point instructions on different execution units than integer instructions. You get better overall latency when a stream of instructions stays within the same "domain" (integer or floating point). This is covered in pretty good detail in Agner Fog's optimization manual, in the section titled "Data Bypass Delays": http://www.agner.org/optimize/microarchitecture.pdf

I found this explanation in this similar SO question: Difference between MOVDQA and MOVAPS x86 instructions?

In case anyone cares, this is exactly why in Agner Fog's vectorclass he has seperate vector classes to use with boolean float (Vec4fb) and boolean integer (Vec4i) http://www.agner.org/optimize/#vectorclass

In his manual he writes. "The reason why we have defined a separate Boolean vector class for use with floating point vectors is that it enables us to produce faster code. (Many modern CPU's have separate execution units for integer vectors and floating point vectors. It is sometimes possible to do the Boolean operations in the floating point unit and thereby avoid the delay from moving data between the two units)."

Most questions about SSE and AVX can be answered by reading his manual and more importantly looking at the code in his vectorclass.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!