implict SIMD (SSE/AVX) broadcasts with GCC

江枫思渺然 提交于 2019-12-01 05:12:21

I think there is currently no direct way and you have to work around it using the syntax you already noticed:

__m256 zero={};
__m256 x=zero+3.14159f;

It may change in the future if we can agree on a good syntax, see PR 55726.

Note that if you want to create a vector { s, s, ... s } with a non-constant float s, the technique above only works with integers, or with floats and -fno-signed-zeros. You can tweak it to __m256 x=s-zero; and it will work unless you use -frounding-math. A last version, suggested by Z boson, is __m256 x=(zero+1.f)*s; which should work in most cases (except possibly with a compiler paranoid about sNaN).

Z boson

It turns out that with a precise floating point model (e.g. with -O3) that GCC cannot simplify x+0 to x due to signed zero. So x = zero+3.14159f produces inefficient code. However GCC can simplify 1.0*x to just x therefore the efficient solution in this case is.

__m256 x = ((__m256){} + 1)*3.14159f;

https://godbolt.org/g/5QAQkC

See this answer for more details.


A simpler solution is just x = 3.14159f - (__m256){} because x - 0 = x irrespective of signed zero.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!