I intend to multiply 2 matrices using the cache-friendly method ( that would lead to less number of misses)
I found out that this can be done with a cache friendly t
@Cesar's answer is not correct. For example, the inner loop
for (int k = 0; k < n; k++)
s += a[i,k] * Bcolj[k];
goes through the i-th column of a.
The following code should ensure we always visit data row by row.
void multiply(const double (&a)[I][K],
const double (&b)[K][J],
double (&c)[I][J])
{
for (int j=0; j