How to Calculate single-vector Dot Product using SSE intrinsic functions in C

后端 未结 4 984
攒了一身酷
攒了一身酷 2020-12-08 08:12

I am trying to multiply two vectors together where each element of one vector is multiplied by the element in the same index at the other vector. I then want to sum all the

4条回答
  •  情话喂你
    2020-12-08 08:50

    I'd say the fastest SSE method would be:

    static inline float CalcDotProductSse(__m128 x, __m128 y) {
        __m128 mulRes, shufReg, sumsReg;
        mulRes = _mm_mul_ps(x, y);
    
        // Calculates the sum of SSE Register - https://stackoverflow.com/a/35270026/195787
        shufReg = _mm_movehdup_ps(mulRes);        // Broadcast elements 3,1 to 2,0
        sumsReg = _mm_add_ps(mulRes, shufReg);
        shufReg = _mm_movehl_ps(shufReg, sumsReg); // High Half -> Low Half
        sumsReg = _mm_add_ss(sumsReg, shufReg);
        return  _mm_cvtss_f32(sumsReg); // Result in the lower part of the SSE Register
    }
    

    I followed - Fastest Way to Do Horizontal Float Vector Sum On x86.

提交回复
热议问题