compute mean in python for a generator

后端未结

关注

 10  2257

I\'m doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I\'d like to work with generators, because I just need to compute the

相关标签:

10条回答

轻奢々

2021-01-01 22:36
In general if you're doing a streaming mean calculation of floating point numbers, you're probably better off using a more numerically stable algorithm than simply summing the generator and dividing by the length.

The simplest of these (that I know) is usually credited to Knuth, and also calculates variance. The link contains a python implementation, but just the mean portion is copied here for completeness.
```
def mean(data):
    n = 0
    mean = 0.0

    for x in data:
        n += 1
        mean += (x - mean)/n

    if n < 1:
        return float('nan');
    else:
        return mean
```
I know this question is super old, but it's still the first hit on google, so it seemed appropriate to post. I'm still sad that the python standard library doesn't contain this simple piece of code.
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2021-01-01 22:36
If you know the length of the generator in advance and you want to avoid storing the full list in memory, you can use:
```
reduce(np.add, generator)/length
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

傲寒

2021-01-01 22:47

def my_mean(values):
    total = 0
    for n, v in enumerate(values, 1):
        total += v
    return total / n

print my_mean(X)
print my_mean(Y)

There is statistics.mean() in Python 3.4 but it calls list() on the input:

def mean(data):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    return _sum(data)/n

where _sum() returns an accurate sum (math.fsum()-like function that in addition to float also supports Fraction, Decimal).

0 讨论(0)

粉色の甜心

2021-01-01 22:48

The old-fashioned way to do it:

def my_mean(values):
   sum, n = 0, 0
   for x in values:
      sum += x
      n += 1
   return float(sum)/n

0 讨论(0)

陌清茗

2021-01-01 22:49
You can use reduce without knowing the size of the array:
```
from itertools import izip, count
reduce(lambda c,i: (c*(i[1]-1) + float(i[0]))/i[1], izip(values,count(1)),0)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2021-01-01 22:51
Try:
```
import itertools

def mean(i):
    (i1, i2) = itertools.tee(i, 2)
    return sum(i1) / sum(1 for _ in i2)

print mean([1,2,3,4,5])
```
tee will duplicate your iterator for any iterable i (e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.

(Note that 'tee' will still use intermediate storage).
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页