compute mean in python for a generator

我的未来我决定 提交于 2019-12-03 10:53:46

Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.

def my_mean(values):
    n = 0
    Sum = 0.0
    for v in values:
        Sum += v
        n += 1
    return Sum / n

In general if you're doing a streaming mean calculation of floating point numbers, you're probably better off using a more numerically stable algorithm than simply summing the generator and dividing by the length.

The simplest of these (that I know) is usually credited to Knuth, and also calculates variance. The link contains a python implementation, but just the mean portion is copied here for completeness.

def mean(data):
    n = 0
    mean = 0.0

    for x in data:
        n += 1
        mean += (x - mean)/n

    if n < 1:
        return float('nan');
    else:
        return mean

I know this question is super old, but it's still the first hit on google, so it seemed appropriate to post. I'm still sad that the python standard library doesn't contain this simple piece of code.

def my_mean(values):
    total = 0
    for n, v in enumerate(values, 1):
        total += v
    return total / n

print my_mean(X)
print my_mean(Y)

There is statistics.mean() in Python 3.4 but it calls list() on the input:

def mean(data):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    return _sum(data)/n

where _sum() returns an accurate sum (math.fsum()-like function that in addition to float also supports Fraction, Decimal).

Jimmy

The old-fashioned way to do it:

def my_mean(values):
   sum, n = 0, 0
   for x in values:
      sum += x
      n += 1
   return float(sum)/n

One way would be

numpy.fromiter(Y, int).mean()

but this actually temporarily stores the numbers.

Your approach is a good one, but you should instead use the for x in y idiom instead of repeatedly calling next until you get a StopIteration. This works for both lists and generators:

def my_mean(values):
    n = 0
    Sum = 0.0

    for value in values:
        Sum += value
        n += 1
    return float(Sum)/n
def my_mean(values):
    n = 0
    sum = 0
    for v in values:
        sum += v
        n += 1
    return sum/n

The above is very similar to your code, except by using for to iterate values you are good no matter if you get a list or an iterator. The python sum method is however very optimized, so unless the list is really, really long, you might be more happy temporarily storing the data.

(Also notice that since you are using python3, you don't need float(sum)/n)

If you know the length of the generator in advance and you want to avoid storing the full list in memory, you can use:

reduce(np.add, generator)/length

You can use reduce without knowing the size of the array:

from itertools import izip, count
reduce(lambda c,i: (c*(i[1]-1) + float(i[0]))/i[1], izip(values,count(1)),0)

Try:

import itertools

def mean(i):
    (i1, i2) = itertools.tee(i, 2)
    return sum(i1) / sum(1 for _ in i2)

print mean([1,2,3,4,5])

tee will duplicate your iterator for any iterable i (e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.

(Note that 'tee' will still use intermediate storage).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!