How to perform element-wise left shift with __m128i?

﹥>﹥吖頭↗ 提交于 2019-12-06 04:49:38

问题


The SSE shift instructions I have found can only shift by the same amount on all the elements:

  • _mm_sll_epi32()
  • _mm_slli_epi32()

These shift all elements, but by the same shift amount.

Is there a way to apply different shifts to the different elements? Something like this:

__m128i a,  __m128i b;  

r0:=    a0  <<  b0;
r1:=    a1  <<  b1;
r2:=    a2  <<  b2;
r3:=    a3  <<  b3;

回答1:


There exists the _mm_shl_epi32() intrinsic that does exactly that.

http://msdn.microsoft.com/en-us/library/gg445138.aspx

However, it requires the XOP instruction set. Only AMD Bulldozer and Interlagos processors or later have this instruction. It is not available on any Intel processor.

If you want to do it without XOP instructions, you will need to do it the hard way: Pull them out and do them one by one.

Without XOP instructions, you can do this with SSE4.1 using the following intrinsics:

  • _mm_insert_epi32()
  • _mm_extract_epi32()

http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse41_reg_ins_ext.htm

Those will let you extract parts of a 128-bit register into regular registers to do the shift and put them back.

If you go with the latter method, it'll be horrifically inefficient. That's why _mm_shl_epi32() exists in the first place.




回答2:


Without XOP, your options are limited. If you can control the format of the shift count argument, then you can use _mm_mullo_pi16 since multiplying by a power of two is the same as shifting by that power.

For example, if you want to shift your 8 16-bit elements in an SSE register by <0, 1, 2, 3, 4, 5, 6, 7> you can multiply by 2 raised to the shift count powers, i.e., by <0, 2, 4, 8, 16, 32, 64, 128>.




回答3:


in some circumstances, this can substitute for _mm_shl_epi32(a, b):

_mm_mullo_ps(a, 1 << b);

generally speaking, this requires b to have a constant value - I don't know of an efficient way to calculate (1 << b) using older SSE instructions.



来源:https://stackoverflow.com/questions/11148833/how-to-perform-element-wise-left-shift-with-m128i

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!