SSE _mm_movemask_epi8 equivalent method for ARM NEON

后端 未结 4 2113
你的背包
你的背包 2020-12-10 00:05

I decided to continue Fast corners optimisation and stucked at _mm_movemask_epi8 SSE instruction. How can i rewrite it for ARM Neon with uint8x16_t

4条回答
  •  南方客
    南方客 (楼主)
    2020-12-10 00:17

    after some tests it looks like following code works correct:

    int32_t _mm_movemask_epi8_neon(uint8x16_t input)
    {
        const int8_t __attribute__ ((aligned (16))) xr[8] = {-7,-6,-5,-4,-3,-2,-1,0};
        uint8x8_t mask_and = vdup_n_u8(0x80);
        int8x8_t mask_shift = vld1_s8(xr);
    
        uint8x8_t lo = vget_low_u8(input);
        uint8x8_t hi = vget_high_u8(input);
    
        lo = vand_u8(lo, mask_and);
        lo = vshl_u8(lo, mask_shift);
    
        hi = vand_u8(hi, mask_and);
        hi = vshl_u8(hi, mask_shift);
    
        lo = vpadd_u8(lo,lo);
        lo = vpadd_u8(lo,lo);
        lo = vpadd_u8(lo,lo);
    
        hi = vpadd_u8(hi,hi);
        hi = vpadd_u8(hi,hi);
        hi = vpadd_u8(hi,hi);
    
        return ((hi[0] << 8) | (lo[0] & 0xFF));
    }
    

提交回复
热议问题