SSE/SIMD shift with one-byte element size / granularity?

后端 未结 2 2028
梦如初夏
梦如初夏 2020-12-22 03:27

As you know we have below Shift instructions in SIMD SSE: PSLL(W-D-Q) and PSRL(W-D-Q)

There\'s no PSLLB instruction, so how ca

2条回答
  •  死守一世寂寞
    2020-12-22 03:54

    Here's another way to emulate "psrab" which works for SSE or AVX with 1 scratch register:

      __ punpckhbw(scratch, src);  // junk in low bytes
      __ punpcklbw(dst, src);      // junk in low bytes
      __ psraw(scratch, 8 + shift);
      __ psraw(dst, 8 + shift);
      __ packsswb(dst, scratch);   // pack words to get result
    

提交回复
热议问题