Fastest way to do horizontal vector sum with AVX instructions [duplicate]
问题 This question already has answers here : Get sum of values stored in __m256d with SSE/AVX (2 answers) Closed 11 months ago . I have a packed vector of four 64-bit floating-point values. I would like to get the sum of the vector's elements. With SSE (and using 32-bit floats) I could just do the following: v_sum = _mm_hadd_ps(v_sum, v_sum); v_sum = _mm_hadd_ps(v_sum, v_sum); Unfortunately, even though AVX features a _mm256_hadd_pd instruction, it differs in the result from the SSE version. I