I\'m writing a SSE code to 2D convolution but SSE documentation is very sparse. I\'m calculating dot product with _mm_dp_ps and using _mm_extract_ps to get the dot product
_mm_cvtss_f32(_mm_shuffle_ps(__X, __X, __N)) will do the job.
_mm_cvtss_f32(_mm_shuffle_ps(__X, __X, __N))