I want to speed up a matrix multiply algorithm. I am trying to use the Intel SIMD functions, but I am finding that I don\'t quite understand what they do.
For context