Floating point representations seem to do integer arithmetic correctly - why?

前端未结

关注

 8  1303

I\'ve been playing around with floating point numbers a little bit, and based on what I\'ve learned about them in the past, the fact that 0.1 + 0.2 ends up bein

相关标签:

8条回答

爱一瞬间的悲伤

2021-01-06 10:44

Is this is a feature of the design, an mathematical artifact, or some optimisation done by compilers and runtime environments?

It's a feature of the real numbers. A theorem from modern algebra (modern algebra, not high school algebra; math majors take a class in modern algebra after their basic calculus and linear algebra classes) says that for some positive integer b, any positive real number r can be expressed as r = a * b^p, where a is in [1,b) and p is some integer. For example, 1024₁₀ = 1.024₁₀*10³. It is this theorem that justifies our use of scientific notation.

That number a can be classified as terminal (e.g. 1.0), repeating (1/3=0.333...), or non-repeating (the representation of pi). There's a minor issue here with terminal numbers. Any terminal number can be also be represented as a repeating number. For example, 0.999... and 1 are the same number. This ambiguity in representation can be resolved by specifying that numbers that can be represented as terminal numbers are represented as such.

What you have discovered is a consequence of the fact that all integers have a terminal representation in any base.

There is an issue here with how the reals are represented in a computer. Just as int and long long int don't represent all of integers, float and double don't represent all of the reals. The scheme used on most computer to represent a real number r is to represent in the form r = a*2^p, but with the mantissa (or significand) a truncated to a certain number of bits and the exponent p limited to some finite number. What this means is that some integers cannot be represented exactly. For example, even though a googol (10¹⁰⁰) is an integer, it's floating point representation is not exact. The base 2 representation of a googol is a 333 bit number. This 333 bit mantissa is truncated to 52+1 bits.

On consequence of this is that double precision arithmetic is no longer exact, even for integers if the integers in question are greater than 2⁵³. Try your experiment using the type unsigned long long int on values between 2⁵³ and 2⁶⁴. You'll find that double precision arithmetic is no longer exact for these large integers.

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2021-01-06 10:44

That method only works, when you are adding a small enough integer to very large integer -- and even in that case you are not representing both of the integers in the 'floating point' format.

0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2021-01-06 10:55

Integers withing the representable range are exactly representable by the machine, floats are not (well, most of them).

If by "basic integer math" you understand "feature", then yes, you can assume correctly implementing arithmetic is a feature.

0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2021-01-06 10:55

Integers are exact because because the imprecision results mainly from the way we write decimal fractions, and secondarily because many rational numbers simply don't have non-repeating representations in any given base.

See: https://stackoverflow.com/a/9650037/140740 for the full explanation.

0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-01-06 11:00
All floating point numbers can't be represented. it's due to the way of coding them. The wiki page explain it better than me: http://en.wikipedia.org/wiki/IEEE_754-1985. So when you are trying to compare a floating point number, you should use a delta:
```
myFloat - expectedFloat < delta
```
You can use the smallest representable floating point number as delta.
0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2021-01-06 11:06

I'm writing that under assumption that Javascript uses double-precision floating-point representation for all numbers.

Some numbers have an exact representation in the floating-point format, in particular, all integers such that |x| < 2^53. Some numbers don't, in particular, fractions such as 0.1 or 0.2 which become infinite fractions in binary representation.

If all operands and the result of an operation have an exact representation, then it would be safe to compare the result using ==.

Related questions:

What number in binary can only be represented as an approximation?

Why can't decimal numbers be represented exactly in binary?

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页