Golang floating point precision float32 vs float64

后端 未结 2 762
遇见更好的自我
遇见更好的自我 2020-12-10 10:48

I wrote a program to demonstrate floating point error in Go:

func main() {
    a := float64(0.2) 
    a += 0.1
    a -= 0.3
    var i int
    for i = 0; a &         


        
相关标签:
2条回答
  • 2020-12-10 11:04

    Using math.Float32bits and math.Float64bits, you can see how Go represents the different decimal values as a IEEE 754 binary value:

    Playground: https://play.golang.org/p/ZqzdCZLfvC

    Result:

    float32(0.1): 00111101110011001100110011001101
    float32(0.2): 00111110010011001100110011001101
    float32(0.3): 00111110100110011001100110011010
    float64(0.1): 0011111110111001100110011001100110011001100110011001100110011010
    float64(0.2): 0011111111001001100110011001100110011001100110011001100110011010
    float64(0.3): 0011111111010011001100110011001100110011001100110011001100110011
    

    If you convert these binary representation to decimal values and do your loop, you can see that for float32, the initial value of a will be:

    0.20000000298023224
    + 0.10000000149011612
    - 0.30000001192092896
    = -7.4505806e-9
    

    a negative value that can never never sum up to 1.

    So, why does C behave different?

    If you look at the binary pattern (and know slightly about how to represent binary values), you can see that Go rounds the last bit while I assume C just crops it instead.

    So, in a sense, while neither Go nor C can represent 0.1 exactly in a float, Go uses the value closest to 0.1:

    Go:   00111101110011001100110011001101 => 0.10000000149011612
    C(?): 00111101110011001100110011001100 => 0.09999999403953552
    

    Edit:

    I posted a question about how C handles float constants, and from the answer it seems that any implementation of the C standard is allowed to do either. The implementation you tried it with just did it differently than Go.

    0 讨论(0)
  • 2020-12-10 11:14

    Agree with ANisus, go is doing the right thing. Concerning C, I'm not convinced by his guess.

    The C standard does not dictate, but most implementations of libc will convert the decimal representation to nearest float (at least to comply with IEEE-754 2008 or ISO 10967), so I don't think this is the most probable explanation.

    There are several reasons why the C program behavior might differ... Especially, some intermediate computations might be performed with excess precision (double or long double).

    The most probable thing I can think of, is if ever you wrote 0.1 instead of 0.1f in C.
    In which case, you might have cause excess precision in initialization
    (you sum float a+double 0.1 => the float is converted to double, then result is converted back to float)

    If I emulate these operations

    float32(float32(float32(0.2) + float64(0.1)) - float64(0.3))
    

    Then I find something near 1.1920929e-8f

    After 27 iterations, this sums to 1.6f

    0 讨论(0)
提交回复
热议问题