Shift elements to the left of a SIMD register based on boolean mask

蓝咒 提交于 2019-12-22 00:36:03

问题


This question is related to this: Optimal uint8_t bitmap into a 8 x 32bit SIMD "bool" vector

I would like to create an optimal function with this signature:

__m256i PackLeft(__m256i inputVector, __m256i boolVector);

The desired behaviour is that on an input of 64bit int like this:

inputVector = {42, 17, 13, 3}

boolVector = {true, false, true, false}

It masks all values that have false in the boolVector and then repacks the values that remain to the left. On the output above, the return value should be:

{42, 13, X, X}

... Where X is "I don't care".

An obvious way to do this is the use _mm_movemask_epi8 to get a 8 byte int out of the bool vector, look up the shuffle mask in a table and then do a shuffle with the mask.

However, I would like to avoid a lookup table if possible. Is there a faster solution?


回答1:


This is covered quite well by Andreas Fredriksson in his 2015 GDC talk: https://deplinenoise.files.wordpress.com/2015/03/gdc2015_afredriksson_simd.pdf

Starting on slide 104, he covers how to do this using only SSSE3 and then using just SSE2.




回答2:


Just saw this problem - perhaps u have already solved it, but am still writing the logic for other programmers who may need to handle this situation.

The solution (in Intel ASM format) is given below. It consists of three steps :

Step 0 : convert the 8 bit mask into a 64 bit mask, with each set bit in the original mask represented as a 8 set bits in the expanded mask.

Step 1 : Use this expanded mask to extract the relevant bits from the source data

Step 2: Since you require the data to be left packed, we shift the output by appropriate number of bits.

The code is as below :

; Step 0 : convert the 8 bit mask into a 64 bit mask
    xor     r8,r8
    movzx   rax,byte ptr mask_pattern
    mov     r9,rax  ; save a copy of the mask - avoids a memory read in Step 2
    mov     rcx,8   ; size of mask in bit count
outer_loop :
    shr     al,1    ; get the least significant bit of the mask into CY
    setnc   dl      ; set DL to 0 if CY=1, else 1
    dec dl      ; if mask lsb was 1, then DL is 1111, else it sets to 0000
    shrd    r8,rdx,8
    loop    outer_loop
; We get the mask duplicated in R8, except it now represents bytewise mask
; Step 1 : we extract the bits compressed to the lowest order bit
    mov     rax,qword ptr data_pattern
    pext    rax,rax,r8
; Now we do a right shift, as right aligned output is required
    popcnt  r9,r9   ; get the count of bits set in the mask
    mov     rcx,8
    sub     cl,r9b  ; compute 8-(count of bits set to 1 in the mask)
    shl     cl,3    ; convert the count of bits to count of bytes
    shl     rax,cl
;The required data is in RAX

Trust this helps



来源:https://stackoverflow.com/questions/28735461/shift-elements-to-the-left-of-a-simd-register-based-on-boolean-mask

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!