I\'m writing a SSE code to 2D convolution but SSE documentation is very sparse. I\'m calculating dot product with _mm_dp_ps and using _mm_extract_ps to get the dot product
extern void _mm_store_ss(float*, __m128);
See 'xmmintrin.h.'