Appropriate scale for converting via BigDecimal to floating point

问题

I've written an arbitrary precision rational number class that needs to provide a way to convert to floating-point. This can be done straightforwardly via BigDecimal:

return new BigDecimal(num).divide(new BigDecimal(den), 17, RoundingMode.HALF_EVEN).doubleValue();

but this requires a value for the scale parameter when dividing the decimal numbers. I picked 17 as the initial guess because that is approximately the precision of a double precision floating point number, but I don't know whether that's actually correct.

What would be the correct number to use, defined as, the smallest number such that making it any larger would not make the answer any more accurate?

回答1:

Introduction

No finite precision suffices.

The problem posed in the question is equivalent to:

What precision p guarantees that converting any rational number x to p decimal digits and then to floating-point yields the floating-point number nearest x (or, in case of a tie, either of the two nearest x)?

To see this is equivalent, observe that the BigDecimal divide shown in the question returns num/div to a selected number of decimal places. The question then asks whether increasing that number of decimal places could increase the accuracy of the result. Clearly, if there is a floating-point number nearer x than the result, then the accuracy could be improved. Thus, we are asking how many decimal places are needed to guarantee the closest floating-point number (or one of the tied two) is obtained.

Since BigDecimal offers a choice of rounding methods, I will consider whether any of them suffices. For the conversion to floating-point, I presume round-to-nearest-ties-to-even is used (which BigDecimal appears to use when converting to Double or Float). I give a proof using the IEEE-754 binary64 format, which Java uses for Double, but the proof applies to any binary floating-point format by changing the 2⁵² used below to 2^w-1, where w is the number of bits in the significand.

Proof

One of the parameters to a BigDecimal division is the rounding method. Java’s BigDecimal has several rounding methods. We only need to consider three, ROUND_UP, ROUND_HALF_UP, and ROUND_HALF_EVEN. Arguments for the others are analogous to those below, by using various symmetries.

In the following, suppose we convert to decimal using any large precision p. That is, p is the number of decimal digits in the result of the conversion.

Let m be the rational number 2⁵²+1+½−10^−p. The two binary64 numbers neighboring m are 2⁵²+1 and 2⁵²+2. m is closer to the first one, so that is the result we require from converting m first to decimal and then to floating-point.

In decimal, m is 4503599627370497.4999…, where there are p−1 trailing 9s. When rounded to p significant digits with ROUND_UP, ROUND_HALF_UP, or ROUND_HALF_EVEN, the result is 4503599627370497.5 = 2⁵²+1+½. (Recognize that, at the position where rounding occurs, there are 16 trailing 9s being discarded, effectively a fraction of .9999999999999999 relative to the rounding position. In ROUND_UP, any non-zero discarded amount causes rounding up. In ROUND_HALF_UP and ROUND_HALF_EVEN, a discarded amount greater than ½ at that position causes rounding up.)

2⁵²+1+½ is equally close to the neighboring binary64 numbers 2⁵²+1 and 2⁵²+2, so the round-to-nearest-ties-to-even method produces 2⁵²+2.

Thus, the result is 2⁵²+2, which is not the binary64 value closest to m.

Therefore, no finite precision p suffices to round all rational numbers correctly.

来源：https://stackoverflow.com/questions/58277384/appropriate-scale-for-converting-via-bigdecimal-to-floating-point

标签

java

floating-point

bigdecimal

rational-number