I\'m trying to optimize as much as possible an operation done on slices of u32 from arrays of u8. As such, I\'m testing different options (for loop
u32
u8