What is the fastest way to transpose the bits in an 8x8 block on bits?

前端 未结 7 2069
余生分开走
余生分开走 2020-12-05 21:10

I\'m not sure the exact term for what I\'m trying to do. I have an 8x8 block of bits stored in 8 bytes, each byte stores one row. When

7条回答
  •  甜味超标
    2020-12-05 22:06

    If you wanted an optimized solution you would use the SSE extensions in x86. You'd need to use 4 of these SIMD opcodes. MOVQ - move 8 bytes PSLLW - packed shift left logical words PMOVMSKB - packed move mask byte And 2 regular x86 opcodes LEA - load effective address MOV - move

    byte[] m = byte[8]; //input
    byte[] o = byte[8]; //output
    LEA ecx, [o]
    // ecx = the address of the output array/matrix
    MOVQ xmm0, [m]
    // xmm0 = 0|0|0|0|0|0|0|0|m[7]|m[6]|m[5]|m[4]|m[3]|m[2]|m[1]|m[0]
    PMOVMSKB eax, xmm0
    // eax = m[7][7]...m[0][7] the high bit of each byte
    MOV [ecx+7], al
    // o[7] is now the last column
    PSLLW xmm0, 1
    // shift 1 bit to the left
    PMOVMSKB eax, xmm0
    MOV [ecx+6], al
    PSLLW xmm0, 1
    PMOVMSKB eax, xmm0
    MOV [ecx+5], al
    PSLLW xmm0, 1
    PMOVMSKB eax, xmm0
    MOV [ecx+4], al
    PSLLW xmm0, 1
    PMOVMSKB eax, xmm0
    MOV [ecx+3], al
    PSLLW xmm0, 1
    PMOVMSKB eax, xmm0
    MOV [ecx+2], al
    PSLLW xmm0, 1
    PMOVMSKB eax, xmm0
    MOV [ecx+1], al
    PSLLW xmm0, 1
    PMOVMSKB eax, xmm0
    MOV [ecx], al
    

    25 x86 opcodes/instructions as opposed to the stacked for...loop solution with 64 iterations. Sorry the notation is not the ATT style syntax that c/c++ compilers accept.

提交回复
热议问题