问题
What is a tight lower-bound on the size of the set of irrational numbers, N, expressed as doubles in Matlab on a 64-bit machine, that I multiply together while having confidence in k decimal digits of the product? What precision, for example could I expect after multiplying together ~10^12 doubles encoding different random chunks of pi?
回答1:
For 64 bit floating point numbers, assuming the standard IEEE 754, has 52+1 bits of mantissa.
That means relative precision is between 1.0000...0 and 1.0000...1, where the number of binary digits after the decimal point is 52. (You can think of the 1.000...0 as what is stored in binary in the mantissa AKA significand).
The error is 1/2 to the power of 52 divided by 2 (half the resolution). Note I choose the relative precision as close to 1.0 as possible, because it is the worst case (otherwise between 1.111..11 and 1.111..01, it is more precise).
In decimal, the worst case relative precision of a double is 1.11E-16.
If you multiply N doubles with this precision, the new relative precision (assuming no additional error due to intermediate rounding) is:
1 - (1 - 1.11E-16)^N
So if you multiply pi (or any double 10^12) times, the upper bound on error is:
1.1102e-004
That means you can have confidence in about 4-5 digits.
You can ignore intermediate rounding error if your CPU has support for extended precision floating point numbers for intermediate results.
If there is no extended precision FPU (floating point unit) used, rounding in intermediate steps introduces additional error (same as due to multiplication). That means that a strict lower bound calculated as:
1 -
((1 - 1.11E-16) * (1 - 1.11E-16) * (1 - 1.11E-16)
* (1 - 1.11E-16) * (1 - 1.11E-16) % for multiplication, then rounding
... (another N-4 lines here) ...
* (1 - 1.11E-16) * (1 - 1.11E-16))
= 1-(1-1.11E-16)^(N*2-1)
If N is too large, it takes too long to run. The possible error (with intermediate rounding) is 2.2204e-012, which is double compared to without intermediate rounding 1-(1 - 1.11E-16)^N
=1.1102e-012.
Approximately, we can say that intermediate rounding doubles the error.
If you multiplied pi 10^12 times, and there was no extended precision FPU. This might be because you write intermediate steps to memory (and perhaps do something else), before continuing (just make sure the compiler hasn't reordered your instructions so that there is no FPU result accumulation), then a strict upper bound on your relative error is:
2.22e-004
Note that confidence in decimal places doesn't mean it will be exactly that decimal places sometimes.
For example, if the answer is:
1.999999999999, and the error is 1E-5, the actual answer could be 2.000001234.
In this case, even the first decimal digit was wrong. But that really depends on how lucky you are (whether the answer falls on a boundary such as this).
This solution assumes that the doubles (including the answer) are all normalized. For denormalized results, obviously, the number binary digits by which it is denormalized will reduce the accuracy by that many digits.
回答2:
If you ask for tight bound, the response of @EricPostpischil is the absolute error bound if all operations are performed in IEEE 754 double precision.
If you ask for confidence, I understand it as statistical distribution of errors. Assuming a uniform distribution of error in [-e/2,e/2] you could try to ask for theoretical distribution of error after M operations on math stack exchange... I guess the tight bound is somehow very conservative.
Let's illustrate an experimental estimation of those stats with some Smalltalk code (any language having large integer/fraction arithmetic could do):
nOp := 500.
relativeErrorBound := ((1 + (Float epsilon asFraction / 2)) raisedTo: nOp * 2 - 1) - 1.0.
nToss := 1000.
stats := (1 to: nToss)
collect: [:void |
| fractions exactProduct floatProduct relativeError |
fractions := (1 to: nOp) collect: [:e | 10000 atRandom / 3137].
exactProduct := fractions inject: 1 into: [:prod :element | prod * element].
floatProduct := fractions inject: 1.0 into: [:prod :element | prod * element].
relativeError := (floatProduct asFraction - exactProduct) / exactProduct.
relativeError].
s1 := stats detectSum: [:each | each].
s2 := stats detectSum: [:each | each squared].
maxEncounteredError := (stats detectMax: [:each | each abs]) abs asFloat.
estimatedMean := (s1 /nToss) asFloat.
estimatedStd := (s2 / (nToss-1) - (s1/nToss) squared) sqrt.
I get these results for multiplication of nOp=20 double:
relativeErrorBound -> 4.440892098500626e-15
maxEncounteredError -> 1.250926201710214e-15
estimatedMean -> -1.0984634797115124e-18
estimatedStd -> 2.9607828266493842e-16
For nOp=100:
relativeErrorBound -> 2.220446049250313e-14
maxEncounteredError -> 2.1454964094158273e-15
estimatedMean -> -1.8768492273800676e-17
estimatedStd -> 6.529482793500846e-16
And for nOp=500:
relativeErrorBound -> 1.1102230246251565e-13
maxEncounteredError -> 4.550696454362764e-15
estimatedMean -> 9.51007740905571e-17
estimatedStd -> 1.4766176010100097e-15
You can observe that the standard deviation growth is much more slow than that of error bound.
UPDATE: at first approximation (1+e)^m = 1+m*e+O((m*e)^2)
, so the distribution is approximately a sum of m uniform in [-e,e] as long as m*e is small enough, and this sum is very near a normal distribution (gaussian) of variance m*(2e)^2/12
. You can check that std(sum(rand(100,5000)))
is near sqrt(100/12)
in Matlab.
We can consider it is still true for m=2*10^12-1
, that is approximately m=2^41
, m*e=2^-12
. In which case, the global error is a quasi normal distribution and the standard deviation of global error is sigma=(2^-52*sqrt(2^41/12))
or approximately sigma=10^-10
See http://en.wikipedia.org/wiki/Normal_distribution to compute P(abs(error)>k*sigma)
- In 68% of case (1 sigma), you'll have 10 digits of precision or more.
erfc(10/sqrt(2))
gives you the probability to have less than 9 digits of precision, about 1 case out of 6*10^22, so I let you compute the probability of having only 4 digits of precision (you can't evaluate it with double precision, it underflows) !!!
My experimental standard deviation were a bit smaller than theoretical ones (2e-15 9e-16 4e-16 for 20 100 & 500 double) but this must be due to a biased distriution of my inputs errors i/3137 i=1..10000...
That's a good way to remind that the result will be dominated by the distribution of errors in your inputs, which might exceed e if they result from floating point operations like M_PI*num/den
Also, as Eric said, using only * is quite an ideal case, things might degenerate quicker if you mix +.
Last note: we can craft a list of inputs that reach the maximum error bound, set all elements to be (1+e) which will be rounded to 1.0, and we get the maximum theoretical error bound, but our input distribution is quite biased! HEM WRONG since all multiplication are exact we get only (1+e)^n, not (1+e)^(2n-1), so about only half the error...
UPDATE 2: the inverse problem
Since you want the inverse, what is the length n of sequence such that I get k digits of precision with a certain level of confidence 10^-c
I'll answer only for k>=8
, because (m*e) << 1
is required in above approximations.
Let's take c=7
, you get k digits with a confidence of 10^-7 means 5.3*sigma < 10^-k.
sigma = 2*e*sqrt((2*n-1)/12)
that is n=0.5+1.5*(sigma/e)^2
with e=2^-53
.
Thus n ~ 3*2^105*sigma^2, as sigma^2 < 10^-2k/5.3^2, we can write n < 3*2^105*10^-(2*k)/5.3^2
A.N. the probability of having less than k=9 digits is less than 10^-7 for a length n=4.3e12, and around n=4.3e10 for 10 digits.
We would reach n=4 numbers for 15 digits, but here our normal distribution hypothesis is very rough and does not hold, especially distribution tail at 5 sigmas, so use with caution (Berry–Esseen theorem bounds how far from normal is such distribution http://en.wikipedia.org/wiki/Berry-Esseen_theorem )
回答3:
The relative error in M operations as described is at most (1+2-53)M-1, assuming all input, intermediate, and final values do not underflow or overflow.
Consider converting a real number a0 to double precision. The result is some number a0•(1+e), where -2-53 ≤ e ≤ 2-53 (because conversion to double precision should always produce the closest representable value, and the quantum for double precision values is 2-53 of the highest bit, and the closest value is always within half a quantum). For further analysis, we will consider the worst case value of e, 2-53.
When we multiply one (previously converted) value by another, the mathematically exact result is a0•(1+e) • a1•(1+e). The result of the calculation has another rounding error, so the calculated result is a0•(1+e) • a1•(1+e) • (1+e) = a0 • a1 • (1+e)3. Obviously, this is a relative error of (1+e)3. We can see the error accumulates simply as (1+e)M for these operations: Each operation multiplies all previous error terms by 1+e.
Given N inputs, there will be N conversions and N-1 multiplications, so the worst error will be (1+e)2 N - 1.
Equality for this error is achieved only for N≤1. Otherwise, the error must be less than this bound.
Note that an error bound this simple is possible only in a simple problem, such as this one with homogeneous operations. In typical floating-point arithmetic, with a mixture of addition, subtraction, multiplication, and other operations, computing a bound so simply is generally not possible.
For N=1012 (M=2•1012-1), the above bound is less than 2.000222062•1012 units of 2-53, and is less than .0002220693. So the calculated result is good to something under four decimal digits. (Remember, though, you need to avoid overflow and underflow.)
(Note on the strictness of the above calculation: I used Maple to calculate 1000 terms of the binomial (1+2-53)2•1012-1 exactly (having removed the initial 1 term) and to add a value that is provably larger than the sum of all remaining terms. Then I had Maple evaluate that exact result to 1000 decimal digits, and it was less than the bound I report above.)
来源:https://stackoverflow.com/questions/12047410/the-number-of-correct-decimal-digits-in-a-product-of-doubles-with-a-large-number