Converting Int to Float or Float to Int using Bitwise operations (software floating point)

后端未结

关注

 3  579

一生所求 2020-12-01 13:04

I was wondering if you could help explain the process on converting an integer to float, or a float to an integer. For my class, we are to do this using only bitwise operato

3条回答

时光取名叫无心 (楼主)

2020-12-01 13:36

Joe Z's answer is elegant but range of input values is highly limited. 32 bit float can store all integer values from the following range:

[-2²⁴...+2²⁴] = [-16777216...+16777216]

and some other values outside this range.

The whole range would be covered by this:

float int2float(int value)
{
    // handles all values from [-2^24...2^24]
    // outside this range only some integers may be represented exactly
    // this method will use truncation 'rounding mode' during conversion

    // we can safely reinterpret it as 0.0
    if (value == 0) return 0.0;

    if (value == (1U<<31)) // ie -2^31
    {
        // -(-2^31) = -2^31 so we'll not be able to handle it below - use const
        // value = 0xCF000000;
        return (float)INT_MIN;  // *((float*)&value); is undefined behaviour
    }

    int sign = 0;

    // handle negative values
    if (value < 0)
    {
        sign = 1U << 31;
        value = -value;
    }

    // although right shift of signed is undefined - all compilers (that I know) do
    // arithmetic shift (copies sign into MSB) is what I prefer here
    // hence using unsigned abs_value_copy for shift
    unsigned int abs_value_copy = value;

    // find leading one
    int bit_num = 31;
    int shift_count = 0;

    for(; bit_num > 0; bit_num--)
    {
        if (abs_value_copy & (1U<= 23)
            {
                // need to shift right
                shift_count = bit_num - 23;
                abs_value_copy >>= shift_count;
            }
            else
            {
                // need to shift left
                shift_count = 23 - bit_num;
                abs_value_copy <<= shift_count;
            }
            break;
        }
    }

    // exponent is biased by 127
    int exp = bit_num + 127;

    // clear leading 1 (bit #23) (it will implicitly be there but not stored)
    int coeff = abs_value_copy & ~(1<<23);

    // move exp to the right place
    exp <<= 23;

    union
    {
        int rint;
        float rfloat;
    }ret = { sign | exp | coeff };

    return ret.rfloat;
}

Of course there are other means to find abs value of int (branchless). Similarly couting leading zeros can also be done without a branch so treat this example as example ;-).

0 讨论(0)

查看其它3个回答