问题
I'm using Neon Instrinics with clang.
I want to test two uint32x4_t SIMD values for equality over all lanes.
So not 4 test results, but one single result that tells me if A and B are equal for all lanes.
On Intel AVX, I would use something like:
_mm256_testz_si256( _mm256_xor_si256( A, B ), _mm256_set1_epi64x( -1 ) )
What would be a good way to perform an all-lane equality test for NEON SIMD?
I am assuming I will need intrinsics that operate across lanes. Does ARM Neon have those features?
回答1:
Try this:
uint16x4_t t = vqmovn_u32(veorq_u32(a, b));
vget_lane_u64(vreinterpret_u64_u16(t), 0) == 0
I expect the compiler to find target-specific optimizations when implementing that test.
I just realised something handy...
If you want to test that all lanes are less than some power of two, you can do this by replacing vqmovn_u32() with vqshrn_n_u32(); and I believe this can be extended to being within +/- a power of two (including the lower bound, excluding the upper bound) for signed types using vqrshrn_n_s32(). For example, you should be able to accept both -1 and 0 in a single test using vqrshrn_n_s32(x, 1).
回答2:
If your just want to know if two vectors are equal or not, try following code:
result = vceqq_u32(a, b);
if (vminvq_u32(result ) != 0xffffffff) {
// not equal
} else {
// equal
}
See ARM's manual: CMEQ and UMINV
来源:https://stackoverflow.com/questions/41005281/testing-neon-simd-registers-for-equality-over-all-lanes