On a machine with data cache (even a 68030 has one), reading/writing data in consecutive memory locations is way faster, because a block of memory (size depends on the processor) is fetched once from memory and then recalled from the cache (read operation) or written all at once (cache flush for write operation).
By "skipping" data (reading far from the previous read), the CPU has to read the memory again.
That's why your first snippet is faster.
For more complex operations (fast fourier transform for instance), where data is read more than once (unlike your example) a lot of libraries (FFTW for instance) propose to use a stride to accomodate your data organization (in rows/in columns). Never use it, always transpose your data first and use a stride of 1, it will be faster than trying to do it without transposition.
To make sure your data is consecutive, never use 2D notation. First position your data in the selected row and set a pointer to the start of the row, then use an inner loop on that row.
for (i=0; i < ROWS; i++) {
const long *row = m[i];
for (j=0; j < COLS; j++) {
sum += row[j];
}
}
If you cannot do this, that means that your data is wrongly oriented.