Is there any single instruction or function that can invert the sign of every float inside a __m128?
i.e. a = r0:r1:r2:r3 ===> a = -r0:-r1:-r2:-r3
?
I know this can be done by _mm_sub_ps(_mm_set1_ps(0.0),a)
, but isn't it potentially slow since _mm_set1_ps(0.0)
is a multi-instruction function?
In practice your compiler should do a good job of generating the constant vector for 0.0. It will probably just use _mm_xor_ps
, and if your code is in a loop it should hoist the constant generation out of the loop anyway. So, bottom line, use your original idea of:
v = _mm_sub_ps(_mm_set1_ps(0.0), v);
or another common trick, which is:
v = _mm_xor_ps(v, _mm_set1_ps(-0.0));
which just flips the sign bits instead of doing a subtraction (not quite as safe as the first method, since it doesn't do the right thing with NaNs, but may be more efficient in some cases).
来源:https://stackoverflow.com/questions/20083997/how-to-negate-change-sign-of-the-floating-point-elements-in-a-m128-type-vari