GCC: vectorization difference between two similar loops

前端 未结 4 2119
误落风尘
误落风尘 2020-12-24 06:18

When compiling with gcc -O3, why does the following loop not vectorize (automatically):

#define SIZE (65536)

int a[SIZE], b[SIZE], c[SIZE];

in         


        
4条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-24 06:41

    GCC vectorizer is probably not smart enough to vectorize the first loop. The addition case is easier to vectorize because a + 0 == a. Consider SIZE==4:

      0 1 2 3 i
    0 X
    1 X X
    2 X X X
    3 X X X X
    j
    

    X denotes the combinations of i and j when a will be assigned to or increased. For the case of addition, we can compute the results of b[i] > c[j] ? b[i] : c[j] for, say, j==1 and i==0..4 and put it into vector D. Then we only need to zero D[2..3] and add resulting vector to a[0..3]. For the case of assignment, it is a little more trickier. We must not only zero D[2..3], but also zero A[0..1] and only then combine the results. I guess this is where the vectorizer is failing.

提交回复
热议问题