Emulating shifts on 32 bytes with AVX

后端未结

关注

 3  427

忘了有多久 2020-11-29 11:34

I am migrating vectorized code written using SSE2 intrinsics to AVX2 intrinsics.

Much to my disappointment, I discover that the shift instructions _mm256_sll

3条回答

情话喂你 (楼主)

2020-11-29 11:49

Here is a function to bit shift left a ymm register using avx2. I use it to shift left by one, though it looks like it works for up to 63 bit shifts.

//---------------------------------------------------------------------------- // bit shift left a 256-bit value using ymm registers // __m256i *data - data to shift // int count - number of bits to shift // return: __m256i - carry out bit(s) static __m256i bitShiftLeft256ymm (__m256i *data, int count) { __m256i innerCarry, carryOut, rotate; innerCarry = _mm256_srli_epi64 (*data, 64 - count); // carry outs in bit 0 of each qword rotate = _mm256_permute4x64_epi64 (innerCarry, 0x93); // rotate ymm left 64 bits innerCarry = _mm256_blend_epi32 (_mm256_setzero_si256 (), rotate, 0xFC); // clear lower qword *data = _mm256_slli_epi64 (*data, count); // shift all qwords left *data = _mm256_or_si256 (*data, innerCarry); // propagate carrys from low qwords carryOut = _mm256_xor_si256 (innerCarry, rotate); // clear all except lower qword return carryOut; } //----------------------------------------------------------------------------

0 讨论(0)

查看其它3个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复