I am trying to multiply two vectors together where each element of one vector is multiplied by the element in the same index at the other vector. I then want to sum all the
I wrote this and compiled it with gcc -O3 -S -ftree-vectorize -ftree-vectorizer-verbose=2 sse.c
void f(int * __restrict__ a, int * __restrict__ b, int * __restrict__ c, int * __restrict__ d,
int * __restrict__ e, int * __restrict__ f, int * __restrict__ g, int * __restrict__ h,
int * __restrict__ o)
{
int i;
for (i = 0; i < 8; ++i)
o[i] = a[i]*e[i] + b[i]*f[i] + c[i]*g[i] + d[i]*h[i];
}
And GCC 4.3.0 auto-vectorized it:
sse.c:5: note: LOOP VECTORIZED.
sse.c:2: note: vectorized 1 loops in function.
However, it would only do that if I used a loop with enough iterations -- otherwise the verbose output would clarify that vectorization was unprofitable or the loop was too small. Without the __restrict__ keywords it has to generate separate, non-vectorized versions to deal with cases where the output o may point into one of the inputs.
I would paste the instructions as an example, but since part of the vectorization unrolled the loop it's not very readable.