This post is closely related to another one I posted some days ago. This time, I wrote a simple code that just adds a pair of arrays of elements, multiplies the result by th
I would like to extend chill's answer and draw your attention on the fact that GCC seems not to be able to do the same smart use of the AVX instructions when iterating backwards.
Just replace the inner loop in chill's sample code with:
for (i = N-1; i >= 0; --i)
r[i] = (a[i] + b[i]) * c[i];
GCC (4.8.4) with options -S -O3 -mavx
produces:
.L5:
vmovsd a+79992(%rax), %xmm0
subq $8, %rax
vaddsd b+80000(%rax), %xmm0, %xmm0
vmulsd c+80000(%rax), %xmm0, %xmm0
vmovsd %xmm0, r+80000(%rax)
cmpq $-80000, %rax
jne .L5