What is the fastest way to transpose the bits in an 8x8 block on bits?

前端未结

关注

 7  2062

余生分开走 2020-12-05 21:10

I\'m not sure the exact term for what I\'m trying to do. I have an 8x8 block of bits stored in 8 bytes, each byte stores one row. When

7条回答

南方客 (楼主)

2020-12-05 21:54

This is similar to the get column in a bitboard problem and can be solved efficiently by considering those input bytes as 8 bytes of a 64-bit integer. If bit 0 is the least significant one and byte 0 is the first byte in the array then I assume you want to do the following

b07 b06 b05 b04 b03 b02 b01 b00      b70 b60 b50 b40 b30 b20 b10 b00
b17 b16 b15 b14 b13 b12 b11 b10      b71 b61 b51 b41 b31 b21 b11 b01
b27 b26 b25 b24 b23 b22 b21 b20      b72 b62 b52 b42 b32 b22 b12 b02
b37 b36 b35 b34 b33 b32 b31 b30  =>  b73 b63 b53 b43 b33 b23 b13 b03
b47 b46 b45 b44 b43 b42 b41 b40  =>  b74 b64 b54 b44 b34 b24 b14 b04
b57 b56 b55 b54 b53 b52 b51 b50      b75 b65 b55 b45 b35 b25 b15 b05
b67 b66 b65 b64 b63 b62 b61 b60      b76 b66 b56 b46 b36 b26 b16 b06
b77 b76 b75 b74 b73 b72 b71 b70      b77 b67 b57 b47 b37 b27 b17 b07

with bXY is byte X's bit number Y. Masking out all the first 7 columns and read the array as an uint64_t we'll have

0000000h 0000000g 0000000f 0000000e 0000000d 0000000c 0000000b 0000000a

in little endian, with abcdefgh are b00 to b70 respectively. Now we just need to multiply that value with the magic number 0x2040810204081 to make a value with hgfedcba in the MSB which is the flipped form in the result

uint8_t get_byte(uint64_t matrix, unsigned col)
{
    const uint64_t column_mask = 0x8080808080808080ull;
    const uint64_t magic       = 0x2040810204081ull;

    return ((matrix << (7 - col)) & column_mask) * magic  >> 56;
}

// You may need to change the endianness if you address the data in a different way
uint64_t block8x8 = ((uint64_t)byte[7] << 56) | ((uint64_t)byte[6] << 48)
                  | ((uint64_t)byte[5] << 40) | ((uint64_t)byte[4] << 32)
                  | ((uint64_t)byte[3] << 24) | ((uint64_t)byte[2] << 16)
                  | ((uint64_t)byte[1] <<  8) |  (uint64_t)byte[0];

for (int i = 0; i < 8; i++)
    byte_out[i] = get_byte(block8x8, i);

In reality you should read directly into an 8-byte array so that you don't need to combine the bytes later, but you need to align the array properly

In AVX2 Intel introduced the PDEP instruction (accessible via the _pext_u64 intrinsic) in the BMI2 instruction set for this purpose so the function can be done in a single instruction

data[i] = _pext_u64(matrix, column_mask << (7 - col));

More ways to transpose the array can be found in the chess programming wiki

0 讨论(0)

查看其它7个回答