How to calculate moving average in Python 3?

后端 未结 5 1631
挽巷
挽巷 2020-12-06 02:21

Let\'s say I have a list:

y = [\'1\', \'2\', \'3\', \'4\',\'5\',\'6\',\'7\',\'8\',\'9\',\'10\']

I want to create a function that calculates

相关标签:
5条回答
  • 2020-12-06 02:31

    Use the sum and map functions.

    print(sum(map(int, x[num-n:num])))
    

    The map function in Python 3 is basically a lazy version of this:

    [int(i) for i in x[num-n:num]]
    

    I'm sure you can guess what the sum function does.

    0 讨论(0)
  • 2020-12-06 02:32

    There is a great sliding window generator in an old version of the Python docs with itertools examples:

    from itertools import islice
    
    def window(seq, n=2):
        "Returns a sliding window (of width n) over data from the iterable"
        "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
        it = iter(seq)
        result = tuple(islice(it, n))
        if len(result) == n:
            yield result    
        for elem in it:
            result = result[1:] + (elem,)
            yield result
    

    Using that your moving averages is trivial:

    from __future__ import division  # For Python 2
    
    def moving_averages(values, size):
        for selection in window(values, size):
            yield sum(selection) / size
    

    Running this against your input (mapping the strings to integers) gives:

    >>> y= ['1', '2', '3', '4','5','6','7','8','9','10']
    >>> for avg in moving_averages(map(int, y), 5):
    ...     print(avg)
    ... 
    3.0
    4.0
    5.0
    6.0
    7.0
    8.0
    

    To return None the first n - 1 iterations for 'incomplete' sets, just expand the moving_averages function a little:

    def moving_averages(values, size):
        for _ in range(size - 1):
            yield None
        for selection in window(values, size):
            yield sum(selection) / size
    
    0 讨论(0)
  • 2020-12-06 02:40

    While I like Martijn's answer on this, like george, I was wondering if this wouldn't be faster by using a running summation instead of applying the sum() over and over again on mostly the same numbers.

    Also the idea of having None values as default during the ramp up phase is interesting. In fact there may be plenty of different scenarios one could conceive for moving averages. Let's split the calculation of averages into three phases:

    1. Ramp Up: Starting iterations where the current iteration count < window size
    2. Steady Progress: We have exactly window size number of elements available to calculate a normal average := sum(x[iteration_counter-window_size:iteration_counter])/window_size
    3. Ramp Down: At the end of the input data, we could return another window_size - 1 "average" numbers.

    Here's a function that accepts

    • Arbitrary iterables (generators are fine) as input for data
    • Arbitrary window sizes >= 1
    • Parameters to switch on/off production of values during the phases for Ramp Up/Down
    • Callback functions for those phases to control how values are produced. This can be used to constantly provide a default (e.g. None) or to provide partial averages

    Here's the code:

    from collections import deque 
    
    def moving_averages(data, size, rampUp=True, rampDown=True):
        """Slide a window of <size> elements over <data> to calc an average
    
        First and last <size-1> iterations when window is not yet completely
        filled with data, or the window empties due to exhausted <data>, the
        average is computed with just the available data (but still divided
        by <size>).
        Set rampUp/rampDown to False in order to not provide any values during
        those start and end <size-1> iterations.
        Set rampUp/rampDown to functions to provide arbitrary partial average
        numbers during those phases. The callback will get the currently
        available input data in a deque. Do not modify that data.
        """
        d = deque()
        running_sum = 0.0
    
        data = iter(data)
        # rampUp
        for count in range(1, size):
            try:
                val = next(data)
            except StopIteration:
                break
            running_sum += val
            d.append(val)
            #print("up: running sum:" + str(running_sum) + "  count: " + str(count) + "  deque: " + str(d))
            if rampUp:
                if callable(rampUp):
                    yield rampUp(d)
                else:
                    yield running_sum / size
    
        # steady
        exhausted_early = True
        for val in data:
            exhausted_early = False
            running_sum += val
            #print("st: running sum:" + str(running_sum) + "  deque: " + str(d))
            yield running_sum / size
            d.append(val)
            running_sum -= d.popleft()
    
        # rampDown
        if rampDown:
            if exhausted_early:
                running_sum -= d.popleft()
            for (count) in range(min(len(d), size-1), 0, -1):
                #print("dn: running sum:" + str(running_sum) + "  deque: " + str(d))
                if callable(rampDown):
                    yield rampDown(d)
                else:
                    yield running_sum / size
                running_sum -= d.popleft()
    

    It seems to be a bit faster than Martijn's version - which is far more elegant, though. Here's the test code:

    print("")
    print("Timeit")
    print("-" * 80)
    
    from itertools import islice
    def window(seq, n=2):
        "Returns a sliding window (of width n) over data from the iterable"
        "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
        it = iter(seq)
        result = tuple(islice(it, n))
        if len(result) == n:
            yield result    
        for elem in it:
            result = result[1:] + (elem,)
            yield result
    
    # Martijn's version:
    def moving_averages_SO(values, size):
        for selection in window(values, size):
            yield sum(selection) / size
    
    
    import timeit
    problems = [int(i) for i in (10, 100, 1000, 10000, 1e5, 1e6, 1e7)]
    for problem_size in problems:
        print("{:12s}".format(str(problem_size)), end="")
    
        so = timeit.repeat("list(moving_averages_SO(range("+str(problem_size)+"), 5))", number=1*max(problems)//problem_size,
                           setup="from __main__ import moving_averages_SO")
        print("{:12.3f} ".format(min(so)), end="")
    
        my = timeit.repeat("list(moving_averages(range("+str(problem_size)+"), 5, False, False))", number=1*max(problems)//problem_size,
                           setup="from __main__ import moving_averages")
        print("{:12.3f} ".format(min(my)), end="")
    
        print("")
    

    And the output:

    Timeit
    --------------------------------------------------------------------------------
    10                 7.242        7.656 
    100                5.816        5.500 
    1000               5.787        5.244 
    10000              5.782        5.180 
    100000             5.746        5.137 
    1000000            5.745        5.198 
    10000000           5.764        5.186 
    

    The original question can now be solved with this function call:

    print(list(moving_averages(range(1,11), 5,
                               rampUp=lambda _: None,
                               rampDown=False)))
    

    The output:

    [None, None, None, None, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
    
    0 讨论(0)
  • 2020-12-06 02:44

    There is another solution extending an itertools recipe pairwise(). You can extend this to nwise(), which gives you the sliding window (and works if the iterable is a generator):

    def nwise(iterable, n):
        ts = it.tee(iterable, n)
        for c, t in enumerate(ts):
            next(it.islice(t, c, c), None)
        return zip(*ts)
    
    def moving_averages_nw(iterable, n):
        yield from (sum(x)/n for x in nwise(iterable, n))
    
    >>> list(moving_averages_nw(range(1, 11), 5))
    [3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
    

    While a relatively high setup cost for short iterables this cost reduces in impact the longer the data set. This uses sum() but the code is reasonably elegant:

    Timeit              MP           cfi         *****
    --------------------------------------------------------------------------------
    10                 4.658        4.959        7.351 
    100                5.144        4.070        4.234 
    1000               5.312        4.020        3.977 
    10000              5.317        4.031        3.966 
    100000             5.508        4.115        4.087 
    1000000            5.526        4.263        4.202 
    10000000           5.632        4.326        4.242 
    
    0 讨论(0)
  • 2020-12-06 02:47

    An approach that avoids recomputing intermediate sums..

    list=range(0,12)
    def runs(v):
     global runningsum
     runningsum+=v
     return(runningsum)
    runningsum=0
    runsumlist=[ runs(v) for v in list ]
    result = [ (runsumlist[k] - runsumlist[k-5])/5 for k in range(0,len(list)+1)]
    

    print result

    [2,3,4,5,6,7,8,9]
    

    make that runs(int(v)) .. then .. repr( runsumlist[k] - runsumlist[k-5])/5 ) if you ant to carry around numbers a strings..


    Alt without the global:

    list = [float[x] for x in range(0,12)]
    nave = 5
    movingave = sum(list[:nave]/nave)
    for i in range(len(list)-nave):movingave.append(movingave[-1]+(list[i+nave]-list[i])/nave)
    print movingave 
    

    be sure to do floating math even if you input values are integers

    [2.0,3.0,4.0,5.0,6.0,7.0,8.0,9,0]
    
    0 讨论(0)
提交回复
热议问题