How to Calculate single-vector Dot Product using SSE intrinsic functions in C

后端 未结 4 990
攒了一身酷
攒了一身酷 2020-12-08 08:12

I am trying to multiply two vectors together where each element of one vector is multiplied by the element in the same index at the other vector. I then want to sum all the

4条回答
  •  北荒
    北荒 (楼主)
    2020-12-08 08:50

    I wrote this and compiled it with gcc -O3 -S -ftree-vectorize -ftree-vectorizer-verbose=2 sse.c

    void f(int * __restrict__ a, int * __restrict__ b, int * __restrict__ c, int * __restrict__ d,
           int * __restrict__ e, int * __restrict__ f, int * __restrict__ g, int * __restrict__ h,
           int * __restrict__ o)
    {
        int i;
    
        for (i = 0; i < 8; ++i)
            o[i] = a[i]*e[i] + b[i]*f[i] + c[i]*g[i] + d[i]*h[i];
    }
    

    And GCC 4.3.0 auto-vectorized it:

    sse.c:5: note: LOOP VECTORIZED.
    sse.c:2: note: vectorized 1 loops in function.
    

    However, it would only do that if I used a loop with enough iterations -- otherwise the verbose output would clarify that vectorization was unprofitable or the loop was too small. Without the __restrict__ keywords it has to generate separate, non-vectorized versions to deal with cases where the output o may point into one of the inputs.

    I would paste the instructions as an example, but since part of the vectorization unrolled the loop it's not very readable.

提交回复
热议问题