What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

前端 未结 17 2200

I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know a

17条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-29 18:53

    Consider this:

    avg(n1)         : n1                               = a1
    avg(n1, n2)     : ((1/2)*n1)+((1/2)*n2)            = ((1/2)*a1)+((1/2)*n2) = a2
    avg(n1, n2, n3) : ((1/3)*n1)+((1/3)*n2)+((1/3)*n3) = ((2/3)*a2)+((1/3)*n3) = a3
    

    So for any set of doubles of arbitrary size, you could do this (this is in C#, but I'm pretty sure it could be easily translated to Java):

    static double GetAverage(IEnumerable values) {
        int i = 0;
        double avg = 0.0;
        foreach (double value in values) {
            avg = (((double)i / (double)(i + 1)) * avg) + ((1.0 / (double)(i + 1)) * value);
            i++;
        }
    
        return avg;
    }
    

    Actually, this simplifies nicely into (already provided by martinus):

    static double GetAverage(IEnumerable values) {
        int i = 1;
        double avg = 0.0;
        foreach (double value in values) {
            avg += (value - avg) / (i++);
        }
    
        return avg;
    }
    

    I wrote a quick test to try this function out against the more conventional method of summing up the values and dividing by the count (GetAverage_old). For my input I wrote this quick function to return as many random positive doubles as desired:

    static IEnumerable GetRandomDoubles(long numValues, double maxValue, int seed) {
        Random r = new Random(seed);
        for (long i = 0L; i < numValues; i++)
            yield return r.NextDouble() * maxValue;
    
        yield break;
    }
    

    And here are the results of a few test trials:

    long N = 100L;
    double max = double.MaxValue * 0.01;
    
    IEnumerable doubles = GetRandomDoubles(N, max, 0);
    double oldWay = GetAverage_old(doubles); // 1.00535024998431E+306
    double newWay = GetAverage(doubles); // 1.00535024998431E+306
    
    doubles = GetRandomDoubles(N, max, 1);
    oldWay = GetAverage_old(doubles); // 8.75142021696299E+305
    newWay = GetAverage(doubles); // 8.75142021696299E+305
    
    doubles = GetRandomDoubles(N, max, 2);
    oldWay = GetAverage_old(doubles); // 8.70772312848651E+305
    newWay = GetAverage(doubles); // 8.70772312848651E+305
    

    OK, but what about for 10^9 values?

    long N = 1000000000;
    double max = 100.0; // we start small, to verify accuracy
    
    IEnumerable doubles = GetRandomDoubles(N, max, 0);
    double oldWay = GetAverage_old(doubles); // 49.9994879713857
    double newWay = GetAverage(doubles); // 49.9994879713868 -- pretty close
    
    max = double.MaxValue * 0.001; // now let's try something enormous
    
    doubles = GetRandomDoubles(N, max, 0);
    oldWay = GetAverage_old(doubles); // Infinity
    newWay = GetAverage(doubles); // 8.98837362725198E+305 -- no overflow
    

    Naturally, how acceptable this solution is will depend on your accuracy requirements. But it's worth considering.

提交回复
热议问题