Of course most languages have library functions for this, but suppose I want to do it myself.
Suppose that the float is given like in a C or Java program (except for
My first thought is to parse the string into an int64
mantissa and an int
decimal exponent using only the first 18 digits of the mantissa. For example, 1.2345e-5 would be parsed into 12345 and -9. Then I would keep multiplying the mantissa by 10 and decrementing the exponent until the mantissa was 18 digits long (>56 bits of precision). Then I would look the decimal exponent up in a table to find a factor and binary exponent that can be used to convert the number from decimal n*10^m to binary p*2^q form. The factor would be another int64
so I'd multiply the mantissa by it such that I obtained the top 64-bits of the resulting 128-bit number. This int64
mantissa can be cast to a float losing only the necessary precision and the 2^q exponent can be applied using multiplication with no loss of precision.
I'd expect this to be very accurate and very fast but you may also want to handle the special numbers NaN, -infinity, -0.0 and infinity. I haven't thought about the denormalized numbers or rounding modes.