Improving the performance of Matrix Multiplication
问题 This is my code for speeding up matrix multiplication, but it is only 5% faster than the simple one. What can i do to boost it as much as possible? *The tables are being accessed for example as: C[sub2ind(i,j,n)] for the C[i, j] position. void matrixMultFast(float * const C, /* output matrix */ float const * const A, /* first matrix */ float const * const B, /* second matrix */ int const n, /* number of rows/cols */ int const ib, /* size of i block */ int const jb, /* size of j block */ int