arm neon compare operations generate negative one

那年仲夏 提交于 2019-12-01 22:53:02

This is normal for vector compare instructions, so you can use the compare result as a mask with AND or XOR instructions, or various other use-cases.

You usually don't need a +1. If you want to count the number of elements that match, for example, just use a subtract instruction to subtract 0 or -1 from a vector accumulator.


To get an integer +1, you could subtract it from 0, or right-shift by element-size -1. (e.g. logical right-shift by 31 to leave just the low bit 0 or 1, and the rest of the bits all-zero). You could also AND with a vector of +1s that you created earlier.

I don't know which of these would be best for ARM, or if that would depend on the microarchitecture. (I really only know SIMD for x86 SSE/AVX.) I'm sure NEON can do at least one of the options I described, though.

There's not a lot of conditional stuff in NEON, but what there is is really only workable with bitwise, rather than Boolean, logic - see e.g. vbsl.

If you have horrible memories of BASIC and really hate bitwise truth values, then the trivial way to convert the mask to a Boolean is to just take the top bit of each element:

vshr.u32 q9, q9, #31

Although negation, whilst arguably less clear to read at a glance, could be microscopically better performance-wise in some cases:

vneg.s32 q9, q9

(from a browse through microarchitectural timings, both operations are pretty much identical, but some theoretical advantages of vneg over vshr are that it consumes its inputs later on Cortex-A8, and can issue down both ASIMD pipes of Cortex-A57/A72)

Either way, as said at the top, this only really makes sense for storing the result back to memory to be looked at by non-vectorised code.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!