Intel AVX : Why is there no 256-bits version of dot product for double precision floating point variables? [closed]

问题

In another question on SO we tried (and succeeded) to find a way to replace the AVX missing instruction:

 __m256d _mm256_dp_pd(__m256d m1, __m256d m2, const int mask);

Anyone knows the reason why this instruction is missing ? Partial answer here.

回答1:

The underlying reason for this and various other AVX limitations is that architecturally AVX is little more than two SSE execution units side by side - you will notice that virtually no AVX instructions operate horizontally across the boundary between the two 128 bit halves of a vector (which is particularly annoying in the case of vpalignr). In general you effectively just get two 128 bit SSE operations in parallel, which is useful for the majority of instructions which just operate in an element-wise fashion, but not as useful as a proper 256 bit SIMD implementation.

来源：https://stackoverflow.com/questions/16033266/intel-avx-why-is-there-no-256-bits-version-of-dot-product-for-double-precision

标签

c++

performance

simd

avx

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!