Converting Int to Float or Float to Int using Bitwise operations (software floating point)

后端 未结 3 579
一生所求
一生所求 2020-12-01 13:04

I was wondering if you could help explain the process on converting an integer to float, or a float to an integer. For my class, we are to do this using only bitwise operato

3条回答
  •  时光取名叫无心
    2020-12-01 13:36

    Joe Z's answer is elegant but range of input values is highly limited. 32 bit float can store all integer values from the following range:

    [-224...+224] = [-16777216...+16777216]

    and some other values outside this range.

    The whole range would be covered by this:

    float int2float(int value)
    {
        // handles all values from [-2^24...2^24]
        // outside this range only some integers may be represented exactly
        // this method will use truncation 'rounding mode' during conversion
    
        // we can safely reinterpret it as 0.0
        if (value == 0) return 0.0;
    
        if (value == (1U<<31)) // ie -2^31
        {
            // -(-2^31) = -2^31 so we'll not be able to handle it below - use const
            // value = 0xCF000000;
            return (float)INT_MIN;  // *((float*)&value); is undefined behaviour
        }
    
        int sign = 0;
    
        // handle negative values
        if (value < 0)
        {
            sign = 1U << 31;
            value = -value;
        }
    
        // although right shift of signed is undefined - all compilers (that I know) do
        // arithmetic shift (copies sign into MSB) is what I prefer here
        // hence using unsigned abs_value_copy for shift
        unsigned int abs_value_copy = value;
    
        // find leading one
        int bit_num = 31;
        int shift_count = 0;
    
        for(; bit_num > 0; bit_num--)
        {
            if (abs_value_copy & (1U<= 23)
                {
                    // need to shift right
                    shift_count = bit_num - 23;
                    abs_value_copy >>= shift_count;
                }
                else
                {
                    // need to shift left
                    shift_count = 23 - bit_num;
                    abs_value_copy <<= shift_count;
                }
                break;
            }
        }
    
        // exponent is biased by 127
        int exp = bit_num + 127;
    
        // clear leading 1 (bit #23) (it will implicitly be there but not stored)
        int coeff = abs_value_copy & ~(1<<23);
    
        // move exp to the right place
        exp <<= 23;
    
        union
        {
            int rint;
            float rfloat;
        }ret = { sign | exp | coeff };
    
        return ret.rfloat;
    }
    

    Of course there are other means to find abs value of int (branchless). Similarly couting leading zeros can also be done without a branch so treat this example as example ;-).

提交回复
热议问题