When compiling with gcc -O3
, why does the following loop not vectorize (automatically):
#define SIZE (65536)
int a[SIZE], b[SIZE], c[SIZE];
in
GCC vectorizer is probably not smart enough to vectorize the first loop. The addition case is easier to vectorize because a + 0 == a
. Consider SIZE==4
:
0 1 2 3 i
0 X
1 X X
2 X X X
3 X X X X
j
X
denotes the combinations of i
and j
when a
will be assigned to or increased. For the case of addition, we can compute the results of b[i] > c[j] ? b[i] : c[j]
for, say, j==1
and i==0..4
and put it into vector D
. Then we only need to zero D[2..3]
and add resulting vector to a[0..3]
. For the case of assignment, it is a little more trickier. We must not only zero D[2..3]
, but also zero A[0..1]
and only then combine the results. I guess this is where the vectorizer is failing.