How to Calculate single-vector Dot Product using SSE intrinsic functions in C

后端未结

关注

 4  984

攒了一身酷 2020-12-08 08:12

I am trying to multiply two vectors together where each element of one vector is multiplied by the element in the same index at the other vector. I then want to sum all the

4条回答

情话喂你 (楼主)

2020-12-08 08:50

I'd say the fastest SSE method would be:

static inline float CalcDotProductSse(__m128 x, __m128 y) {
    __m128 mulRes, shufReg, sumsReg;
    mulRes = _mm_mul_ps(x, y);

    // Calculates the sum of SSE Register - https://stackoverflow.com/a/35270026/195787
    shufReg = _mm_movehdup_ps(mulRes);        // Broadcast elements 3,1 to 2,0
    sumsReg = _mm_add_ps(mulRes, shufReg);
    shufReg = _mm_movehl_ps(shufReg, sumsReg); // High Half -> Low Half
    sumsReg = _mm_add_ss(sumsReg, shufReg);
    return  _mm_cvtss_f32(sumsReg); // Result in the lower part of the SSE Register
}

I followed - Fastest Way to Do Horizontal Float Vector Sum On x86.

0 讨论(0)

查看其它4个回答