Emulating shifts on 32 bytes with AVX

后端 未结 3 427
忘了有多久
忘了有多久 2020-11-29 11:34

I am migrating vectorized code written using SSE2 intrinsics to AVX2 intrinsics.

Much to my disappointment, I discover that the shift instructions _mm256_sll

3条回答
  •  情话喂你
    2020-11-29 11:49

    Here is a function to bit shift left a ymm register using avx2. I use it to shift left by one, though it looks like it works for up to 63 bit shifts.

    //----------------------------------------------------------------------------
    // bit shift left a 256-bit value using ymm registers
    //          __m256i *data - data to shift
    //          int count     - number of bits to shift
    // return:  __m256i       - carry out bit(s)
    
    static __m256i bitShiftLeft256ymm (__m256i *data, int count)
       {
       __m256i innerCarry, carryOut, rotate;
    
       innerCarry = _mm256_srli_epi64 (*data, 64 - count);                        // carry outs in bit 0 of each qword
       rotate     = _mm256_permute4x64_epi64 (innerCarry, 0x93);                  // rotate ymm left 64 bits
       innerCarry = _mm256_blend_epi32 (_mm256_setzero_si256 (), rotate, 0xFC);   // clear lower qword
       *data      = _mm256_slli_epi64 (*data, count);                             // shift all qwords left
       *data      = _mm256_or_si256 (*data, innerCarry);                          // propagate carrys from low qwords
       carryOut   = _mm256_xor_si256 (innerCarry, rotate);                        // clear all except lower qword
       return carryOut;
       }
    
    //----------------------------------------------------------------------------
    

提交回复
热议问题