for(int i = 0; i<100; i++)
for(int j = 0; j<100; j++)
array[j][i] = 0;
// array[i][j] = 0;
My professor said it was m
I'm a bit late to the party, and there is an excellent answer already. However, I thought I could contribute by demonstrating how one could answer this question experimentally using a profiling tool (on Linux).
I'll use the perf tool in the Ubuntu 10.10 package linux-tools-common.
Here's the little C program I wrote to answer this question:
// test.c
#define DIM 1024
int main()
{
int v[DIM][DIM];
unsigned i, j;
for (i = 0; i < DIM; i++) {
for (j = 0; j < DIM; j++) {
#ifdef ROW_MAJOR_ORDER
v[i][j] = 0;
#else
v[j][i] = 0;
#endif
}
}
return 0;
}
Then compile the two different versions:
$ gcc test.c -O0 -DROW_MAJOR_ORDER -o row-maj
$ gcc test.c -O0 -o row-min
Note I've disabled optimization with -O0 so gcc has no chance to rearrange our loop to be more efficient.
We can list the performance statistics available with perf by doing perf list. In this case, we are interested in cache misses which is the event cache-misses.
Now it's as simple as running each version of the program numerous times and taking an average:
$ perf stat -e cache-misses -r 100 ./row-min
Performance counter stats for './row-min' (100 runs):
286468 cache-misses ( +- 0.810% )
0.016588860 seconds time elapsed ( +- 0.926% )
$ perf stat -e cache-misses -r 100 ./row-maj
Performance counter stats for './row-maj' (100 runs):
9594 cache-misses ( +- 1.203% )
0.006791615 seconds time elapsed ( +- 0.840% )
And now we've experimentally verified that you do in fact see two orders of magnitude more cache misses with the "row-minor" version.