I\'m looking for the fastest way to test if a 128 NEON register contains all zeros, using NEON intrinsics. I\'m currently using 3 OR operations, and 2 MOVs:
If you're targeting AArch64 NEON, you can use the following to get a value to test with just two instructions:
inline uint64_t is_not_zero(uint32x4_t v) { uint64x2_t v64 = vreinterpretq_u64_u32(v); uint32x2_t v32 = vqmovn_u64(v64); uint64x1_t result = vreinterpret_u64_u32(v32); return result[0]; }