Fastest way to test a 128 bit NEON register for a value of 0 using intrinsics?
I'm looking for the fastest way to test if a 128 NEON register contains all zeros, using NEON intrinsics. I'm currently using 3 OR operations, and 2 MOVs: uint32x4_t vr = vorrq_u32(vcmp0, vcmp1); uint64x2_t v0 = vreinterpretq_u64_u32(vr); uint64x1_t v0or = vorr_u64(vget_high_u64(v0), vget_low_u64(v0)); uint32x2_t v1 = vreinterpret_u32_u64 (v0or); uint32_t r = vget_lane_u32(v1, 0) | vget_lane_u32(v1, 1); if (r == 0) { // do stuff } This translates by gcc to the following assembly code: VORR q9, q9, q10 VORR d16, d18, d19 VMOV.32 r3, d16[0] VMOV.32 r2, d16[1] VORRS r2, r2, r3 BEQ ... Does anyone