问题
Basically how can I write the equivalent of this with AVX2 intrinsics? We assume here that result_in_float
is of type __m256
, while result
is of type short int*
or short int[8]
.
for(i = 0; i < 8; i++)
result[i] = (short int)result_in_float[i];
I know that floats can be converted to 32 bit integers using the __m256i _mm256_cvtps_epi32(__m256 m1)
intrinsic, but have no idea how to convert these 32 bit integers further to 16 bit integers. And I don't want just that but also to store those values (in the form of 16 bit integers) to the memory, and I want to do that all using vector instructions.
Searching around the internet, I found an intrinsic by the name of_mm256_mask_storeu_epi16
, but I'm not really sure if that would do the trick, as I couldn't find an example of its usage.
回答1:
_mm256_cvtps_epi32
is a good first step, the conversion to a packed vector of shorts is a bit annoying, requiring a cross-slice shuffle (so it's good that it's not in a dependency chain here).
Since the values can be assumed to be in the right range (as per the comment), we can use _mm256_packs_epi32
instead of _mm256_shuffle_epi8
to do the conversion, either way it's a 1-cycle instruction on port 5 but using _mm256_packs_epi32
avoids having to get a shuffle mask from somewhere.
So to put it together (not tested)
__m256i tmp = _mm256_cvtps_epi32(result_in_float);
tmp = _mm256_packs_epi32(tmp, _mm256_setzero_si256());
tmp = _mm256_permute4x64_epi64(tmp, 0xD8);
__m128i res = _mm256_castsi256_si128(tmp);
// _mm_store_si128 that
The last step (cast) is free, it just changes the type.
If you had two vectors of floats to convert, you could re-use most of the instructions, eg: (not tested either)
__m256i tmp1 = _mm256_cvtps_epi32(result_in_float1);
__m256i tmp2 = _mm256_cvtps_epi32(result_in_float2);
tmp1 = _mm256_packs_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this
来源:https://stackoverflow.com/questions/41228180/how-can-i-convert-a-vector-of-float-to-short-int-using-avx-instructions