Fastest way to test a 128 bit NEON register for a value of 0 using intrinsics?

蓝咒 提交于 2019-12-01 04:28:36

While this answer may be a bit late, there is a simple way to do the test with only 3 instructions and no extra registers:

inline uint32_t is_not_zero(uint32x4_t v)
{
    uint32x2_t tmp = vorr_u32(vget_low_u32(v), vget_high_u32(v));
    return vget_lane_u32(vpmax_u32(tmp, tmp), 0);
}

The return value will be nonzero if any bit in the 128-bit NEON register was set.

If you're targeting AArch64 NEON, you can use the following to get a value to test with just two instructions:

inline uint64_t is_not_zero(uint32x4_t v)
{
    uint64x2_t v64 = vreinterpretq_u64_u32(v);
    uint32x2_t v32 = vqmovn_u64(v64);
    uint64x1_t result = vreinterpret_u64_u32(v32);
    return result[0];
}

You seem to be looking for intrinsics and this is the way:

inline bool is_zero(int32x4_t v) noexcept
{
  v = v == int32x4{};

  return !int32x2_t(
    vtbl2_s8(
      int8x8x2_t{
        int8x8_t(vget_low_s32(v)),
        int8x8_t(vget_high_s32(v))
      },
      int8x8_t{0, 4, 8, 12}
    )
  )[0];
}

Nils Pipenbrinck's answer has a flaw in that he assumes the QC, cumulative saturation flag to be clear.

If you have AArch64 you can do it even easier. They have a new instruction for designed for this.

inline uint32_t is_not_zero(uint32x4_t v)
{
    return vaddvq_u32(v);
}
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!