I\'m looking for the fastest way to test if a 128 NEON register contains all zeros, using NEON intrinsics. I\'m currently using 3 OR operations, and 2 MOVs:
I'd avoid functions returning integer values that should only be interpreted as bool. A better way would be, for instance, defining a helper function to return maximum unsigned value of 4 lanes:
inline uint32_t max_lane_value_u32(const uint32x4_t& v)
{
#if defined(_WIN32) && defined(_ARM64_)
// Windows 64-bit
return neon_umaxvq32(v);
#elif defined(__LP64__)
// Linux/Android 64-bit
return vmaxvq_u32(v);
#else
// Windows/Linux/Android 32-bit
uint32x2_t result = vmax_u32(vget_low_u32(v), vget_high_u32(v));
return vget_lane_u32(vpmax_u32(result, result), 0);
#endif
}
you can then use:
if (0 == max_lane_value_u32(v))
{
...
}
in your code, and such function might also be useful elsewhere. Alternatively, you can use the exact same code to write a is_not_zero() function, but then it's best form to return a bool.
Note that the only reason you'd need to define a helper function is because vmaxvq_u32() is not available on 32-bit, and may not be aliased from neon_umaxvq32() in arm64_neon.h on Windows.