Why is it worse to initialize a two dimensional array like this?

前端 未结 4 1765
有刺的猬
有刺的猬 2020-12-10 13:40
for(int i = 0; i<100; i++)

    for(int j = 0; j<100; j++)

         array[j][i] = 0;
         // array[i][j] = 0;

My professor said it was m

4条回答
  •  难免孤独
    2020-12-10 14:15

    I'm a bit late to the party, and there is an excellent answer already. However, I thought I could contribute by demonstrating how one could answer this question experimentally using a profiling tool (on Linux).

    I'll use the perf tool in the Ubuntu 10.10 package linux-tools-common.

    Here's the little C program I wrote to answer this question:

    // test.c
    #define DIM 1024
    
    int main()
    {
        int v[DIM][DIM];
        unsigned i, j;
    
        for (i = 0; i < DIM; i++) {
            for (j = 0; j < DIM; j++) {
    #ifdef ROW_MAJOR_ORDER
                v[i][j] = 0;
    #else
                v[j][i] = 0;
    #endif
            }
        }
    
        return 0;
    }
    

    Then compile the two different versions:

    $ gcc test.c -O0 -DROW_MAJOR_ORDER -o row-maj
    $ gcc test.c -O0 -o row-min
    

    Note I've disabled optimization with -O0 so gcc has no chance to rearrange our loop to be more efficient.

    We can list the performance statistics available with perf by doing perf list. In this case, we are interested in cache misses which is the event cache-misses.

    Now it's as simple as running each version of the program numerous times and taking an average:

    $ perf stat -e cache-misses -r 100 ./row-min
    
     Performance counter stats for './row-min' (100 runs):
    
                 286468  cache-misses               ( +-   0.810% )
    
            0.016588860  seconds time elapsed   ( +-   0.926% )
    
    $ perf stat -e cache-misses -r 100 ./row-maj
    
     Performance counter stats for './row-maj' (100 runs):
    
                   9594  cache-misses               ( +-   1.203% )
    
            0.006791615  seconds time elapsed   ( +-   0.840% )
    

    And now we've experimentally verified that you do in fact see two orders of magnitude more cache misses with the "row-minor" version.

提交回复
热议问题