I\'m not sure the exact term for what I\'m trying to do. I have an 8x8
block of bits
stored in 8 bytes
, each byte stores one row. When
This is similar to the get column in a bitboard problem and can be solved efficiently by considering those input bytes as 8 bytes of a 64-bit integer. If bit 0 is the least significant one and byte 0 is the first byte in the array then I assume you want to do the following
b07 b06 b05 b04 b03 b02 b01 b00 b70 b60 b50 b40 b30 b20 b10 b00
b17 b16 b15 b14 b13 b12 b11 b10 b71 b61 b51 b41 b31 b21 b11 b01
b27 b26 b25 b24 b23 b22 b21 b20 b72 b62 b52 b42 b32 b22 b12 b02
b37 b36 b35 b34 b33 b32 b31 b30 => b73 b63 b53 b43 b33 b23 b13 b03
b47 b46 b45 b44 b43 b42 b41 b40 => b74 b64 b54 b44 b34 b24 b14 b04
b57 b56 b55 b54 b53 b52 b51 b50 b75 b65 b55 b45 b35 b25 b15 b05
b67 b66 b65 b64 b63 b62 b61 b60 b76 b66 b56 b46 b36 b26 b16 b06
b77 b76 b75 b74 b73 b72 b71 b70 b77 b67 b57 b47 b37 b27 b17 b07
with bXY is byte X's bit number Y. Masking out all the first 7 columns and read the array as an uint64_t we'll have
0000000h 0000000g 0000000f 0000000e 0000000d 0000000c 0000000b 0000000a
in little endian, with abcdefgh
are b00 to b70 respectively. Now we just need to multiply that value with the magic number 0x2040810204081 to make a value with hgfedcba
in the MSB which is the flipped form in the result
uint8_t get_byte(uint64_t matrix, unsigned col)
{
const uint64_t column_mask = 0x8080808080808080ull;
const uint64_t magic = 0x2040810204081ull;
return ((matrix << (7 - col)) & column_mask) * magic >> 56;
}
// You may need to change the endianness if you address the data in a different way
uint64_t block8x8 = ((uint64_t)byte[7] << 56) | ((uint64_t)byte[6] << 48)
| ((uint64_t)byte[5] << 40) | ((uint64_t)byte[4] << 32)
| ((uint64_t)byte[3] << 24) | ((uint64_t)byte[2] << 16)
| ((uint64_t)byte[1] << 8) | (uint64_t)byte[0];
for (int i = 0; i < 8; i++)
byte_out[i] = get_byte(block8x8, i);
In reality you should read directly into an 8-byte array so that you don't need to combine the bytes later, but you need to align the array properly
In AVX2 Intel introduced the PDEP instruction (accessible via the _pext_u64
intrinsic) in the BMI2 instruction set for this purpose so the function can be done in a single instruction
data[i] = _pext_u64(matrix, column_mask << (7 - col));
More ways to transpose the array can be found in the chess programming wiki