How to perform uint32/float conversion with SSE?

后端 未结 3 476
长发绾君心
长发绾君心 2021-01-02 19:33

In SSE there is a function _mm_cvtepi32_ps(__m128i input) which takes input vector of 32 bits wide signed integers (int32_t) and converts them into

3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-02 19:54

    I think Paul's answer is nice, but it fails for v=4294967295U (=2^32-1). In that case v2=2^31-1 and v1=2^31. Intrinsic _mm_cvtepi32_ps converts 2^31 to -2.14748365E9 . v2=2^31-1 is converted to 2.14748365E9 and consequently _mm_add_ps returns 0 (due to rounding v1f and v2f are the exact opposite of each other).

    The idea of the solution below is to copy the most significant bit of v to v_high. The other bits of v are copied to v_low. v_high is converted to 0 or 2.14748365E9 .

    inline __m128 _mm_cvtepu32_v3_ps(const __m128i v)
    {
    __m128i msk0=_mm_set1_epi32(0x7FFFFFFF);
    __m128i zero=_mm_xor_si128(msk0,msk0);
    __m128i cnst2_31=_mm_set1_epi32(0x4F000000); /* IEEE representation of float 2^31 */
    
    __m128i v_high=_mm_andnot_si128(msk0,v);
    __m128i v_low=_mm_and_si128(msk0,v);
    __m128  v_lowf=_mm_cvtepi32_ps(v_low);
    __m128i msk1=_mm_cmpeq_epi32(v_high,zero);
    __m128  v_highf=_mm_castsi128_ps(_mm_andnot_si128(msk1,cnst2_31));  
    __m128  v_sum=_mm_add_ps(v_lowf,v_highf);
    return v_sum;
    
    }
    


    Update

    It was possible to reduce the number of instructions:

    inline __m128 _mm_cvtepu32_v4_ps(const __m128i v)
    {
    __m128i msk0=_mm_set1_epi32(0x7FFFFFFF);
    __m128i cnst2_31=_mm_set1_epi32(0x4F000000);
    
    __m128i msk1=_mm_srai_epi32(v,31);
    __m128i v_low=_mm_and_si128(msk0,v);
    __m128  v_lowf=_mm_cvtepi32_ps(v_low);
    __m128  v_highf=_mm_castsi128_ps(_mm_and_si128(msk1,cnst2_31));  
    __m128  v_sum=_mm_add_ps(v_lowf,v_highf);
    return v_sum;
    }
    

    Intrinsic _mm_srai_epi32 shifts the most significant bit of v to the right, while shifting in sign bits, which turns out to be quite useful here.

提交回复
热议问题