Fastest way to test a 128 bit NEON register for a value of 0 using intrinsics?

前端 未结 5 483
天命终不由人
天命终不由人 2021-01-12 13:58

I\'m looking for the fastest way to test if a 128 NEON register contains all zeros, using NEON intrinsics. I\'m currently using 3 OR operations, and 2 MOVs:

         


        
5条回答
  •  误落风尘
    2021-01-12 14:41

    I'd avoid functions returning integer values that should only be interpreted as bool. A better way would be, for instance, defining a helper function to return maximum unsigned value of 4 lanes:

    inline uint32_t max_lane_value_u32(const uint32x4_t& v)
    {
    #if defined(_WIN32) && defined(_ARM64_)
        // Windows 64-bit
        return neon_umaxvq32(v);
    #elif defined(__LP64__)
        // Linux/Android 64-bit
        return vmaxvq_u32(v);
    #else
        // Windows/Linux/Android 32-bit
        uint32x2_t result = vmax_u32(vget_low_u32(v), vget_high_u32(v));
        return vget_lane_u32(vpmax_u32(result, result), 0);
    #endif
    }
    

    you can then use:

    if (0 == max_lane_value_u32(v))
    {
        ...
    }
    

    in your code, and such function might also be useful elsewhere. Alternatively, you can use the exact same code to write a is_not_zero() function, but then it's best form to return a bool.

    Note that the only reason you'd need to define a helper function is because vmaxvq_u32() is not available on 32-bit, and may not be aliased from neon_umaxvq32() in arm64_neon.h on Windows.

提交回复
热议问题