How to calculate moving average in Python 3?

后端 未结 5 1644
挽巷
挽巷 2020-12-06 02:21

Let\'s say I have a list:

y = [\'1\', \'2\', \'3\', \'4\',\'5\',\'6\',\'7\',\'8\',\'9\',\'10\']

I want to create a function that calculates

5条回答
  •  萌比男神i
    2020-12-06 02:40

    While I like Martijn's answer on this, like george, I was wondering if this wouldn't be faster by using a running summation instead of applying the sum() over and over again on mostly the same numbers.

    Also the idea of having None values as default during the ramp up phase is interesting. In fact there may be plenty of different scenarios one could conceive for moving averages. Let's split the calculation of averages into three phases:

    1. Ramp Up: Starting iterations where the current iteration count < window size
    2. Steady Progress: We have exactly window size number of elements available to calculate a normal average := sum(x[iteration_counter-window_size:iteration_counter])/window_size
    3. Ramp Down: At the end of the input data, we could return another window_size - 1 "average" numbers.

    Here's a function that accepts

    • Arbitrary iterables (generators are fine) as input for data
    • Arbitrary window sizes >= 1
    • Parameters to switch on/off production of values during the phases for Ramp Up/Down
    • Callback functions for those phases to control how values are produced. This can be used to constantly provide a default (e.g. None) or to provide partial averages

    Here's the code:

    from collections import deque 
    
    def moving_averages(data, size, rampUp=True, rampDown=True):
        """Slide a window of  elements over  to calc an average
    
        First and last  iterations when window is not yet completely
        filled with data, or the window empties due to exhausted , the
        average is computed with just the available data (but still divided
        by ).
        Set rampUp/rampDown to False in order to not provide any values during
        those start and end  iterations.
        Set rampUp/rampDown to functions to provide arbitrary partial average
        numbers during those phases. The callback will get the currently
        available input data in a deque. Do not modify that data.
        """
        d = deque()
        running_sum = 0.0
    
        data = iter(data)
        # rampUp
        for count in range(1, size):
            try:
                val = next(data)
            except StopIteration:
                break
            running_sum += val
            d.append(val)
            #print("up: running sum:" + str(running_sum) + "  count: " + str(count) + "  deque: " + str(d))
            if rampUp:
                if callable(rampUp):
                    yield rampUp(d)
                else:
                    yield running_sum / size
    
        # steady
        exhausted_early = True
        for val in data:
            exhausted_early = False
            running_sum += val
            #print("st: running sum:" + str(running_sum) + "  deque: " + str(d))
            yield running_sum / size
            d.append(val)
            running_sum -= d.popleft()
    
        # rampDown
        if rampDown:
            if exhausted_early:
                running_sum -= d.popleft()
            for (count) in range(min(len(d), size-1), 0, -1):
                #print("dn: running sum:" + str(running_sum) + "  deque: " + str(d))
                if callable(rampDown):
                    yield rampDown(d)
                else:
                    yield running_sum / size
                running_sum -= d.popleft()
    

    It seems to be a bit faster than Martijn's version - which is far more elegant, though. Here's the test code:

    print("")
    print("Timeit")
    print("-" * 80)
    
    from itertools import islice
    def window(seq, n=2):
        "Returns a sliding window (of width n) over data from the iterable"
        "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
        it = iter(seq)
        result = tuple(islice(it, n))
        if len(result) == n:
            yield result    
        for elem in it:
            result = result[1:] + (elem,)
            yield result
    
    # Martijn's version:
    def moving_averages_SO(values, size):
        for selection in window(values, size):
            yield sum(selection) / size
    
    
    import timeit
    problems = [int(i) for i in (10, 100, 1000, 10000, 1e5, 1e6, 1e7)]
    for problem_size in problems:
        print("{:12s}".format(str(problem_size)), end="")
    
        so = timeit.repeat("list(moving_averages_SO(range("+str(problem_size)+"), 5))", number=1*max(problems)//problem_size,
                           setup="from __main__ import moving_averages_SO")
        print("{:12.3f} ".format(min(so)), end="")
    
        my = timeit.repeat("list(moving_averages(range("+str(problem_size)+"), 5, False, False))", number=1*max(problems)//problem_size,
                           setup="from __main__ import moving_averages")
        print("{:12.3f} ".format(min(my)), end="")
    
        print("")
    

    And the output:

    Timeit
    --------------------------------------------------------------------------------
    10                 7.242        7.656 
    100                5.816        5.500 
    1000               5.787        5.244 
    10000              5.782        5.180 
    100000             5.746        5.137 
    1000000            5.745        5.198 
    10000000           5.764        5.186 
    

    The original question can now be solved with this function call:

    print(list(moving_averages(range(1,11), 5,
                               rampUp=lambda _: None,
                               rampDown=False)))
    

    The output:

    [None, None, None, None, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
    

提交回复
热议问题