可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to create a fast decoder for BPSK using the AVX intrinsics of Intel. I have a set of complex numbers that are represented as interleaved floats, but due to the BPSK modulation only the real part (or the even indexed floats) are needed. Every float x is mapped to 0, when x < 0 and to 1 if x >= 0. This is accomplished using the following routine:

static inline void normalize_bpsk_constellation_points(int32_t *out, const complex_t *in, size_t num) {     static const __m256             _min_mask = _mm256_set1_ps(-1.0);     static const __m256             _max_mask = _mm256_set1_ps(1.0);     static const __m256             _mul_mask = _mm256_set1_ps(0.5);      __m256                          res;     __m256i                         int_res;      size_t i;     gr_complex                      temp;     float                           real;      for(i = 0; i < num; i += COMPLEX_PER_AVX_REG){             res = _mm256_load_ps((float *)&in[i]);              /* clamp them to avoid segmentation faults due to indexing */             res = _mm256_max_ps(_min_mask, _mm256_min_ps(_max_mask, res));              /* Scale accordingly for proper indexing -1->0, 1->1 */             res = _mm256_add_ps(res, _max_mask);             res = _mm256_mul_ps(res, _mul_mask);              /* And then round to the nearest integer */             res = _mm256_round_ps(res, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC);              int_res = _mm256_cvtps_epi32(res);              _mm256_store_si256((__m256i *) &out[2*i], int_res);     } }

Firstly, I clamp all the received floats in the range [-1, 1]. Then after some proper scaling, the result is rounded to the nearest integer. That will map all floats above 0.5 to 1 and all floats below 0.5 to 0.

The procedure works fine if the input floats are normal numbers. However, due to some situations at previous stages, there is a possibility that some input floats are NaN or -NaN. At this case, 'NaN' numbers are propagated through the _mm256_max_ps(), _mm256_min_ps() and all other AVX functions resulting to an integer mapping of -2147483648 which of course causes my program to crash due to invalid indexing.

Is there any workaround to avoid this problem, or at least set the NaN to 0 using AVX?

回答1:

You could do it the simple way to begin with, compare and mask: (not tested)

res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US); ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31);

Or shift and xor: (also not tested)

ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31); ires = _mm256_xor_epi32(ires, _mm256_set1_epi32(1));

This version will also care about the sign of NaN (and ignore the NaN-ness).

Alternative for no AVX2 (not tested)

res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US); res = _mm256_and_ps(res, _mm256_set1_ps(1.0f)); ires = _mm256_cvtps_epi32(res);

回答2:

Harold posted a good solution for the question you were really asking, but I want to make clear that eliminating NaN values while clamping is totally straightforward. If either argument is a NaN, MINPS and MAXPS simply return the second argument. So all you need to do is swap the argument order and NaNs will be clamped as well. For example, the following would clamp NaNs to _min_mask:

res = _mm256_max_ps(_mm256_min_ps(_max_mask, res), _min_mask);

文章来源: Comparison with NaN using AVX

标签