Fastest way to test a 128 bit NEON register for a value of 0 using intrinsics?

前端 未结 5 498
天命终不由人
天命终不由人 2021-01-12 13:58

I\'m looking for the fastest way to test if a 128 NEON register contains all zeros, using NEON intrinsics. I\'m currently using 3 OR operations, and 2 MOVs:

         


        
5条回答
  •  北海茫月
    2021-01-12 14:45

    You seem to be looking for intrinsics and this is the way:

    inline bool is_zero(int32x4_t v) noexcept
    {
      v = v == int32x4{};
    
      return !int32x2_t(
        vtbl2_s8(
          int8x8x2_t{
            int8x8_t(vget_low_s32(v)),
            int8x8_t(vget_high_s32(v))
          },
          int8x8_t{0, 4, 8, 12}
        )
      )[0];
    }
    

    Nils Pipenbrinck's answer has a flaw in that he assumes the QC, cumulative saturation flag to be clear.

提交回复
热议问题