Why vectorizing the loop does not have performance improvement

前端 未结 4 1700
难免孤独
难免孤独 2020-11-27 02:17

I am investigating the effect of vectorization on the performance of the program. In this regard, I have written following code:

#include 
#in         


        
4条回答
  •  执笔经年
    2020-11-27 03:13

    Just in case a[] b[] and c[] are fighting for the L2 cache ::

    #include  /* for memcpy */
    
     ...
    
     gettimeofday(&stTime, NULL);
    
        for(k = 0; k < LEN; k += 4) {
            double a4[4], b4[4], c4[4];
            memcpy(a4,a+k, sizeof a4);
            memcpy(b4,b+k, sizeof b4);
            c4[0] = a4[0] * b4[0];
            c4[1] = a4[1] * b4[1];
            c4[2] = a4[2] * b4[2];
            c4[3] = a4[3] * b4[3];
            memcpy(c+k,c4, sizeof c4);
            }
    
        gettimeofday(&endTime, NULL);
    

    Reduces the running time from 98429.000000 to 67213.000000; unrolling the loop 8-fold reduces it to 57157.000000 here.

提交回复
热议问题