SSE: convert short integer to float

99封情书 提交于 2019-12-17 18:55:33

问题


I want to convert an array of unsigned short numbers to float using SSE. Let's say

__m128i xVal;     // Has 8 16-bit unsigned integers
__m128 y1, y2;    // 2 xmm registers for 8 float values

I want first 4 uint16 in y1 & next 4 uint16 in y2. Need to know which sse intrinsic to use.


回答1:


You need to first unpack your vector of 8 x 16 bit unsigned shorts into two vectors of 32 bit unsigned ints, then convert each of these vectors to float:

__m128i xlo = _mm_unpacklo_epi16(x, _mm_set1_epi16(0));
__m128i xhi = _mm_unpackhi_epi16(x, _mm_set1_epi16(0));
__m128 ylo = _mm_cvtepi32_ps(xlo);
__m128 yhi = _mm_cvtepi32_ps(xhi);



回答2:


I would suggest to use a slightly different version:

static const __m128i magicInt = _mm_set1_epi16(0x4B00);
static const __m128 magicFloat = _mm_set1_ps(8388608.0f);

__m128i xlo = _mm_unpacklo_epi16(x, magicInt);
__m128i xhi = _mm_unpackhi_epi16(x, magicInt);
__m128 ylo = _mm_sub_ps(_mm_castsi128_ps(xlo), magicFloat);
__m128 yhi = _mm_sub_ps(_mm_castsi128_ps(xhi), magicFloat);

On assembly level the only difference from Paul R version is usage of _mm_sub_ps (SUBPS instruction) instead of _mm_cvtepi32_ps (CVTDQ2PS instruction). _mm_sub_ps is never slower than _mm_cvtepi32_ps, and is actually faster on old CPUs and on low-power CPUs (read: Intel Atom and AMD Bobcat)



来源:https://stackoverflow.com/questions/9161807/sse-convert-short-integer-to-float

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!