ARM NEON: comparing 128 bit values

白昼怎懂夜的黑 提交于 2019-11-30 17:44:10

问题


I'm interested in finding the fastest way (lowest cycle count) of comparing the values stored into NEON registers (say Q0 and Q3) on a Cortex-A9 core (VFP instructions allowed).

So far I have the following:

(1) Using the VFP floating point comparison:

vcmp.f64        d0, d6
vmrs            APSR_nzcv, fpscr
vcmpeq.f64      d1, d7
vmrseq          APSR_nzcv, fpscr

If the 64bit "floats" are equivalent to NaN, this version will not work.

(2) Using the NEON narrowing and the VFP comparison (this time only once and in a NaN-safe manner):

vceq.i32        q15, q0, q3
vmovn.i32       d31, q15
vshl.s16        d31, d31, #8
vcmp.f64        d31, d29
vmrs            APSR_nzcv, fpscr

The D29 register is previously preloaded with the right 16bit pattern:

vmov.i16        d29, #65280     ; 0xff00

My question is: is there any better than this? Am I overseeing some obvious way to do it?


回答1:


I believe you can reduce it by one instruction. By using the shift left and insert (VLSI), you can combine the 4 32-bit values of Q15 into 4 16-bit values in D31. You can then compare with 0 and get the floating point flags.

vceq.i32  q15, q0, q3
vlsi.32   d31, d30, #16
vcmp.f64  d31, #0
vmrs      APSR_nzcv, fpscr


来源:https://stackoverflow.com/questions/9068959/arm-neon-comparing-128-bit-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!