What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

前端 未结 17 2157

I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know a

相关标签:
17条回答
  • 2020-11-29 19:08

    Option 1 is to use an arbitrary-precision library so you don't have an upper-bound.

    Other options (which lose precision) are to sum in groups rather than all at once, or to divide before summing.

    0 讨论(0)
  • 2020-11-29 19:08

    Check out the section for cummulative moving average

    0 讨论(0)
  • 2020-11-29 19:10

    divide all values by the set size and then sum it up

    0 讨论(0)
  • 2020-11-29 19:12

    Why so many complicated long answers. Here is the simplest way to find the running average till now without any need to know how many elements or size etc..

    long int i = 0; double average = 0; while(there are still elements) { average = average * (i / i+1) + X[i] / (i+1); i++; } return average;

    0 讨论(0)
  • 2020-11-29 19:13

    The very first issue I'd like to ask you is this:

    • Do you know the number of values beforehand?

    If not, then you have little choice but to sum, and count, and divide, to do the average. If Double isn't high enough precision to handle this, then tough luck, you can't use Double, you need to find a data type that can handle it.

    If, on the other hand, you do know the number of values beforehand, you can look at what you're really doing and change how you do it, but keep the overall result.

    The average of N values, stored in some collection A, is this:

    A[0]   A[1]   A[2]   A[3]          A[N-1]   A[N]
    ---- + ---- + ---- + ---- + .... + ------ + ----
     N      N      N      N               N       N
    

    To calculate subsets of this result, you can split up the calculation into equally sized sets, so you can do this, for 3-valued sets (assuming the number of values is divisable by 3, otherwise you need a different divisor)

    / A[0]   A[1]   A[2] \   / A[3]   A[4]   A[5] \   //      A[N-1]   A[N] \
    | ---- + ---- + ---- |   | ---- + ---- + ---- |   \\    + ------ + ---- |
    \  3      3      3   /   \  3      3      3   /   //        3       3   /
     --------------------- +  --------------------  + \\      --------------
              N                        N                        N
             ---                      ---                      ---
              3                        3                        3
    

    Note that you need equally sized sets, otherwise numbers in the last set, which will not have enough values compared to all the sets before it, will have a higher impact on the final result.

    Consider the numbers 1-7 in sequence, if you pick a set-size of 3, you'll get this result:

    / 1   2   3 \   / 4   5   6 \   / 7 \ 
    | - + - + - | + | - + - + - | + | - |
    \ 3   3   3 /   \ 3   3   3 /   \ 3 /
     -----------     -----------     ---
          y               y           y
    

    which gives:

         2   5   7/3
         - + - + ---
         y   y    y
    

    If y is 3 for all the sets, you get this:

         2   5   7/3
         - + - + ---
         3   3    3
    

    which gives:

    2*3   5*3    7
    --- + --- + ---
     9     9     9
    

    which is:

    6   15   7
    - + -- + -
    9    9   9
    

    which totals:

    28
    -- ~ 3,1111111111111111111111.........1111111.........
     9
    

    The average of 1-7, is 4. Obviously this won't work. Note that if you do the above exercise with the numbers 1, 2, 3, 4, 5, 6, 7, 0, 0 (note the two zeroes at the end there), then you'll get the above result.

    In other words, if you can't split the number of values up into equally sized sets, the last set will be counted as though it has the same number of values as all the sets preceeding it, but it will be padded with zeroes for all the missing values.

    So, you need equally sized sets. Tough luck if your original input set consists of a prime number of values.

    What I'm worried about here though is loss of precision. I'm not entirely sure Double will give you good enough precision in such a case, if it initially cannot hold the entire sum of the values.

    0 讨论(0)
提交回复
热议问题