GCC SSE code optimization

后端未结

关注

 2  1177

说谎 2020-12-08 05:45

This post is closely related to another one I posted some days ago. This time, I wrote a simple code that just adds a pair of arrays of elements, multiplies the result by th

2条回答

执笔经年 (楼主)

2020-12-08 06:02
I would like to extend chill's answer and draw your attention on the fact that GCC seems not to be able to do the same smart use of the AVX instructions when iterating backwards.

Just replace the inner loop in chill's sample code with:
```
for (i = N-1; i >= 0; --i)
    r[i] = (a[i] + b[i]) * c[i];
```
GCC (4.8.4) with options -S -O3 -mavx produces:
```
.L5:
    vmovsd  a+79992(%rax), %xmm0
    subq    $8, %rax
    vaddsd  b+80000(%rax), %xmm0, %xmm0
    vmulsd  c+80000(%rax), %xmm0, %xmm0
    vmovsd  %xmm0, r+80000(%rax)
    cmpq    $-80000, %rax
    jne     .L5
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...