Fastest way to test a 128 bit NEON register for a value of 0 using intrinsics?

前端 未结 5 482
天命终不由人
天命终不由人 2021-01-12 13:58

I\'m looking for the fastest way to test if a 128 NEON register contains all zeros, using NEON intrinsics. I\'m currently using 3 OR operations, and 2 MOVs:

         


        
5条回答
  •  一整个雨季
    2021-01-12 14:45

    If you're targeting AArch64 NEON, you can use the following to get a value to test with just two instructions:

    inline uint64_t is_not_zero(uint32x4_t v)
    {
        uint64x2_t v64 = vreinterpretq_u64_u32(v);
        uint32x2_t v32 = vqmovn_u64(v64);
        uint64x1_t result = vreinterpret_u64_u32(v32);
        return result[0];
    }
    

提交回复
热议问题