Is performing complex multiplication and division beneficial through SSE instructions? I know that addition and subtraction perform better when using SSE. Can someone tell me ho
The algorithm in the intel optimization reference does not handle overflows and NaNs in the input properly.
A single NaN in the real or imaginary part of the number will incorrectly spread to the other part.
As several operations with infinity (e.g. infinity * 0) end in NaN, overflows can cause NaNs to appear in your otherwise well-behaved data.
If overflows and NaNs are rare, a simple way to avoid this is to just check for NaN in the result and recompute it with the compilers IEEE compliant implementation:
float complex a[2], b[2];
__m128 res = simd_fast_multiply(a, b);
/* store unconditionally, can be executed in parallel with the check
* making it almost free if there is no NaN in data */
_mm_store_ps(dest, res);
/* check for NaN */
__m128 n = _mm_cmpneq_ps(res, res);
int have_nan = _mm_movemask_ps(n);
if (have_nan != 0) {
/* do it again unvectorized */
dest[0] = a[0] * b[0];
dest[1] = a[1] * b[1];
}