I am trying to create a fast decoder for BPSK
using the AVX
intrinsics of Intel. I have a set of complex numbers that are represented as interleaved floats, but due to the BPSK
modulation only the real part (or the even indexed floats) are needed. Every float x
is mapped to 0
, when x < 0
and to 1
if x >= 0
. This is accomplished using the following routine:
static inline void normalize_bpsk_constellation_points(int32_t *out, const complex_t *in, size_t num) { static const __m256 _min_mask = _mm256_set1_ps(-1.0); static const __m256 _max_mask = _mm256_set1_ps(1.0); static const __m256 _mul_mask = _mm256_set1_ps(0.5); __m256 res; __m256i int_res; size_t i; gr_complex temp; float real; for(i = 0; i < num; i += COMPLEX_PER_AVX_REG){ res = _mm256_load_ps((float *)&in[i]); /* clamp them to avoid segmentation faults due to indexing */ res = _mm256_max_ps(_min_mask, _mm256_min_ps(_max_mask, res)); /* Scale accordingly for proper indexing -1->0, 1->1 */ res = _mm256_add_ps(res, _max_mask); res = _mm256_mul_ps(res, _mul_mask); /* And then round to the nearest integer */ res = _mm256_round_ps(res, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC); int_res = _mm256_cvtps_epi32(res); _mm256_store_si256((__m256i *) &out[2*i], int_res); } }
Firstly, I clamp all the received floats in the range [-1, 1]
. Then after some proper scaling, the result is rounded to the nearest integer. That will map all floats above 0.5
to 1
and all floats below 0.5
to 0
.
The procedure works fine if the input floats are normal numbers. However, due to some situations at previous stages, there is a possibility that some input floats are NaN
or -NaN
. At this case, 'NaN' numbers are propagated through the _mm256_max_ps()
, _mm256_min_ps()
and all other AVX
functions resulting to an integer mapping of -2147483648
which of course causes my program to crash due to invalid indexing.
Is there any workaround to avoid this problem, or at least set the NaN
to 0
using AVX
?