floating point equality in Python and in general

谁说我不能喝 提交于 2019-11-27 19:50:22
deft_code

As has been shown comparing two floats (or doubles etc) can be problematic. Generally, instead of comparing for exact equality they should be checked against an error bound. If they are within the error bound, they are considered equal.

That is much easier said than done. The nature of floating point make a fixed error bound worthless. A small error bound (like 2*float_epsilon) works well when the values are near 0.0, but will fail if the value are near 1000. An error bound for values as large as 1,000,000.0 will be far too lax for values near 0.0.

The best solution is know the domain of your math and pick an approriate err bound on a case by case basis.

When this is impractical or you're being lazy, Units in the Last Place (ULPs) is a very novel and robust solution. The full details are quite involved, you can read more here.

The basic idea is this, a floating point number has two pieces, mantissa and exponent. Generally rounding errors only change the mantissa by a few steps. When the value is near 0.0 those steps are exactly float_epsilon. When the floating point value is nearer to 1,000,000, the steps will be nearly as large as 1.

Google test uses ULP to compare floating point numbers. They chose a default of 4 ULPs for a two floating point numbers to be compared equal. You could also use their code as reference to build your own ULP style float comparator.

The difference is that if you replace factors[units_to ] with 1.0 / 2.54, you're doing:

(base_value * 1.0) / 2.54

With the dictionary, you're doing:

base_value * (1.0 / 2.54)

The order of rounding matters. This is easier to see if you do:

>>> print (((25.4 / 10.0) * 1.0) / 2.54).__repr__()
1.0
>>> print ((25.4 / 10.0) * (1.0 / 2.54)).__repr__()
0.99999999999999989

Note that there is no non-deterministic or undefined behavior. There is a standard, IEEE-754, which implementations must conform to (not to claim they always do).

I don't think there should be an automatic close enough replacement. That is often an effective way to deal with the problem, but it should be up to the programmer to decide if and how to use it.

Finally, there are of course options for arbitrary-precision arithmetic, including python-gmp and decimal. Think whether you actually need these, because they do have a significant performance impact.

There is no issue with moving between regular registers and cache. You may be thinking of the x86's 80-bit extended precision.

krawyoti

Let me first answer by saying that you should read David Goldberg's classic What Every Computer Scientist Should Know About Floating-Point Arithmetic.

As some other commentators already said, the discrepancy you notice is intrinsically due to the floating-point model and has nothing to do with registers, cache or memory.

According to the floating-point model, 2.54 is actually represented as

>>> 2859785763380265 * 2 ** -50
2.54

This representation, however is not exact:

>>> from fractions import Fraction
>>> float(Fraction(2859785763380265, 2 ** 50) - Fraction(254, 100))
3.552713678800501e-17

Now, the expression you are evaluating actually is:

>>> 25.4 / 10 * (1/2.54)
0.99999999999999989

The problem lies in the 1/2.54:

>>> Fraction.from_float(1/2.54)
Fraction(1773070719437203, 4503599627370496)

But what you would expect is

>>> 1/Fraction.from_float(2.54)
Fraction(1125899906842624, 2859785763380265)

To answer your questions:

  • It is a difficult problem, but is clearly deterministic, nothing mysterious there.
  • You cannot automatically replace equality with a close-enough comparison. The latter requires that you specify a tolerance, which depends on the problem at hand, i.e. on what kind of precision you are expecting from your results. There are also plenty of situations where you really want equality not a close-enough comparison.
eric.frederich

Thanks for your responses. Most were very good and provided good links so I'll just say that and answer my own question.

Caspin posted this link.

He also mentioned that Google Tests used ULP comparison and when I looked at the google code I saw that they mentioned the same exact link to cygnus-software.

I wound up implementing some of the algorithms in C as a Python extension and then later found that I could do it in pure Python as well. The code is posted below.

In the end, I will probably just wind up adding ULP differences to my bag of tricks.

It was interesting to see how many floating points are between what should be two equal numbers that never left memory. One of the articles or the google code I read said that 4 was a good number... but here I was able to hit 10.

>>> f1 = 25.4
>>> f2 = f1
>>> 
>>> for i in xrange(1, 11):
...     f2 /= 10.0          # to cm
...     f2 *= (1.0 / 2.54)  # to in
...     f2 *= 25.4          # back to mm
...     print 'after %2d loops there are %2d doubles between them' % (i, dulpdiff(f1, f2))
... 
after  1 loops there are  1 doubles between them
after  2 loops there are  2 doubles between them
after  3 loops there are  3 doubles between them
after  4 loops there are  4 doubles between them
after  5 loops there are  6 doubles between them
after  6 loops there are  7 doubles between them
after  7 loops there are  8 doubles between them
after  8 loops there are 10 doubles between them
after  9 loops there are 10 doubles between them
after 10 loops there are 10 doubles between them

Also interesting is how many floating points there are between equal numbers when one of them is written out as a string and read back in.

>>> # 0 degrees Fahrenheit is -32 / 1.8 degrees Celsius
... f = -32 / 1.8
>>> s = str(f)
>>> s
'-17.7777777778'
>>> # floats between them...
... fulpdiff(f, float(s))
0
>>> # doubles between them...
... dulpdiff(f, float(s))
6255L

import struct
from functools import partial

# (c) 2010 Eric L. Frederich
#
# Python implementation of algorithms detailed here...
# from http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

def c_mem_cast(x, f=None, t=None):
    '''
    do a c-style memory cast

    In Python...

    x = 12.34
    y = c_mem_cast(x, 'd', 'l')

    ... should be equivilent to the following in c...

    double x = 12.34;
    long   y = *(long*)&x;
    '''
    return struct.unpack(t, struct.pack(f, x))[0]

dbl_to_lng = partial(c_mem_cast, f='d', t='l')
lng_to_dbl = partial(c_mem_cast, f='l', t='d')
flt_to_int = partial(c_mem_cast, f='f', t='i')
int_to_flt = partial(c_mem_cast, f='i', t='f')

def ulp_diff_maker(converter, negative_zero):
    '''
    Getting the ulp difference of floats and doubles is similar.
    Only difference if the offset and converter.
    '''
    def the_diff(a, b):

        # Make a integer lexicographically ordered as a twos-complement int
        ai = converter(a)
        if ai < 0:
            ai = negative_zero - ai

        # Make b integer lexicographically ordered as a twos-complement int
        bi = converter(b)
        if bi < 0:
            bi = negative_zero - bi

        return abs(ai - bi)

    return the_diff

# double ULP difference
dulpdiff = ulp_diff_maker(dbl_to_lng, 0x8000000000000000)
# float  ULP difference
fulpdiff = ulp_diff_maker(flt_to_int, 0x80000000        )

# default to double ULP difference
ulpdiff = dulpdiff
ulpdiff.__doc__ = '''
Get the number of doubles between two doubles.
'''

if I run this

x = 0.3+0.3+0.3
if (x != 0.9): print "not equal"
if (x == 0.9): print "equal"

it prints "not equal" which is wrong but as

x-0.9

gives the float error as -1.11022302e-16 i just do something like this:

if (x - 0.9 < 10**-8): print "equal (almost)"

otherwise you can convert both to strings, I guess:

if (str(x) == str(0.9)): print "equal (strings)"

What is the best way to avoid problems like this?... In Python or in general.

What problem? You're working with physical measurements. Unless you have some really sophisticated equipment, the error in your measurements is going to be several orders of magnitude higher than floating-point epsilon. So why write code that depends on numbers being exact to 16 significant digits?

Should compilers support an option to replace all float equality checks with a 'close enough' function?

If it did, you'd get some strange results:

>>> float.tolerance = 1e-8    # hypothetical "close enough" definition
>>> a = 1.23456789
>>> b = 1.23456790
>>> c = 1.23456791
>>> a == b
True
>>> b == c
True
>>> a == c
False

If you think it's hard enough to store floats in a dictionary now, try it with a non-transitive == operator! And performance would suck, because the only way to guarantee x == yhash(x) == hash(y) would be for every float to have the same hash code. And that'd be inconsistent with ints.

In order to compare floats in general compare the absolute value of the difference of the floats to a chosen delta that is small enough to fit your needs.

Rhetorical Questions

  • This **IS a dangerous problem ** as it might hide errors or generate infinite loops if such a comparison is used as stop criteria.
  • Modern C/C++ compilers warn for comparison of floats for equality
  • All static code checkers I know will output errors for the languages I use

I suppose it is the same for python, as the delta to use for comparison may vary it must be up to the implementer to choose it. What means that no good default transformation can be provided fully automatically.

Also interesting is how many floating points there are between equal numbers when one of them is written out as a string and read back in.

That is arguably a Python bug. That number was written out with just twelve digits. Two uniquely identify a 64-bit double (Python's float type) you need seventeen digits of mantissa. If Python printed out its numbers with 17 digits of precision then you would be guaranteed to get back exactly the same value.

The precision issue is discussed at: http://randomascii.wordpress.com/2012/03/08/float-precisionfrom-zero-to-100-digits-2/

The focus is on 32-bit float (which requires nine digits of mantissa to uniquely identify each number) but it briefly mentions double, and the fact that it requires 17 digits of mantissa.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!