I want to multiply a 8x8 binary matrix represented as a unsigned 64 bit integer by a 8 bit vector represented by a unsigned char. However, due to some other issues
You ONLY HAVE 256 vectors! Use lookup tables to generate the right bitmasks, then your logic will be something like
output_bit_n = bool (matrix [n] & lookup [vector])
In other words, your lookup table can transpose an 8-bit value into the 64-bit world.
You can efficiently pack this into the result with rotate-with-carry instructions if the compiler isn't smart enough to optimise (value<<=1)|=result
.