Here\'s the sample C code that I am trying to accelerate using SSE, the two arrays are 3072 element long with doubles, may drop it down to float if i don\'t need the precisi
The maximum of -x and x should be abs(x). Here it is in code:
x = _mm_max_ps(_mm_sub_ps(_mm_setzero_ps(), x), x)