GCC SSE code optimization

后端 未结 2 1168
说谎
说谎 2020-12-08 05:45

This post is closely related to another one I posted some days ago. This time, I wrote a simple code that just adds a pair of arrays of elements, multiplies the result by th

2条回答
  •  执笔经年
    2020-12-08 06:02

    I would like to extend chill's answer and draw your attention on the fact that GCC seems not to be able to do the same smart use of the AVX instructions when iterating backwards.

    Just replace the inner loop in chill's sample code with:

    for (i = N-1; i >= 0; --i)
        r[i] = (a[i] + b[i]) * c[i];
    

    GCC (4.8.4) with options -S -O3 -mavx produces:

    .L5:
        vmovsd  a+79992(%rax), %xmm0
        subq    $8, %rax
        vaddsd  b+80000(%rax), %xmm0, %xmm0
        vmulsd  c+80000(%rax), %xmm0, %xmm0
        vmovsd  %xmm0, r+80000(%rax)
        cmpq    $-80000, %rax
        jne     .L5
    

提交回复
热议问题