There are cases where you know that a certain floating-point expression will always be non-negative. For example, when computing the length of a vector, one does sqrt(
Without any global options, here is a (low-overhead, but not free) way to get a square root with no branch:
#include
float test(float x)
{
return _mm_cvtss_f32(_mm_sqrt_ss(_mm_set1_ps(x * x)));
}
(on godbolt)
As usual, Clang is smart about its shuffles. GCC and MSVC lag behind in that area, and don't manage to avoid the broadcast. MSVC is doing some mysterious moves as well..
There are other ways to turn a float into an __m128
, for example _mm_set_ss
. For Clang that makes no difference, for GCC that makes the code a little bigger and worse (including a movss reg, reg
which counts as a shuffle on Intel, so it doesn't even save on shuffles).