Shifting SSE/AVX registers 32 bits left and right while shifting in zeros

后端 未结 2 1739
走了就别回头了
走了就别回头了 2020-12-10 06:13

I want to shift SSE/AVX registers multiples of 32 bits left or right while shifting in zeros.

Let me be more precise on the shifts I\'m interested in. For SSE I wan

相关标签:
2条回答
  • 2020-12-10 06:45

    Your SSE implementation is fine but I suggest you use the _mm_slli_si128 implementation for both of the shifts - the casts make it look complicated but it really boils down to just one instruction for each shift.

    Your AVX2 implementation won't work unfortunately. Almost all AVX instructions are effectively just two SSE instructions in parallel operating on two adjacent 128 bit lanes. So for your first shift_AVX2 example you'd get:

    0, 0, 1, 2, 0, 4, 5, 6
    ----------- ----------
     LS lane     MS lane
    

    All is not lost however: one of the few instructions which does work across lanes on AVX is _mm256_permutevar8x32_ps. Note that you'll need to use an _mm256_and_ps in conjunction with this to zero the shifted in elements. Note also that this is an AVX2 solution — AVX on its own is very limited for anything other than basic arithmetic/logic operations so I think you'll have a hard time doing this efficiently without AVX2.

    0 讨论(0)
  • 2020-12-10 06:51

    You can do a shift right with _mm256_permute_ps, _mm256_permute2f128_ps, and _mm256_blend_ps as follows:

    __m256 t0 = _mm256_permute_ps(x, 0x39);            // [x4  x7  x6  x5  x0  x3  x2  x1]
    __m256 t1 = _mm256_permute2f128_ps(t0, t0, 0x81);  // [ 0   0   0   0  x4  x7  x6  x5] 
    __m256 y  = _mm256_blend_ps(t0, t1, 0x88);         // [ 0  x7  x6  x5  x4  x3  x2  x1]
    

    The result is in y. In order to do a rotate right, set the permute mask to 0x01 instead of 0x81. Shift/rotate left and larger shifts/rotates can be done similarly by changing the permute and blend control bytes.

    0 讨论(0)
提交回复
热议问题