SSE _mm_movemask_epi8 equivalent method for ARM NEON

后端未结

关注

 4  2120

你的背包 2020-12-10 00:05

I decided to continue Fast corners optimisation and stucked at _mm_movemask_epi8 SSE instruction. How can i rewrite it for ARM Neon with uint8x16_t

4条回答

醉酒成梦 (楼主)

2020-12-10 00:21

Note that I haven't tested any of this, but something like this might work:

X := the vector that you want to create the mask from
A := 0x808080808080...
B := 0x00FFFEFDFCFB...  (i.e. 0,-1,-2,-3,...)

X = vand_u8(X, A);  // Keep d7 of each byte in X
X = vshl_u8(X, B);  // X[7]>>=0; X[6]>>=1; X[5]>>=2; ...
// Each byte of X now contains its msb shifted 7-N bits to the right, where N
// is the byte index.
// Do 3 pairwise adds in order to pack all these into X[0]
X = vpadd_u8(X, X); 
X = vpadd_u8(X, X); 
X = vpadd_u8(X, X);
// X[0] should now contain the mask. Clear the remaining bytes if necessary

This would need to be repeated once to process a 128-bit vector, since vpadd only works on 64-bit vectors.

0 讨论(0)

查看其它4个回答