Find max(and min) on the moving interval using python

人走茶凉 提交于 2019-12-05 14:18:34

Have a look at the rolling windows from pandas:

>>> import pandas as pd
>>> L = [5.5, 6.0, 6.0, 6.5, 6.0, 5.5, 5.5, 5.0, 4.5]
>>> a = pd.DataFrame(L)
>>> pd.rolling_max(a, 3)
     0
0  NaN
1  NaN
2  6.0
3  6.5
4  6.5
5  6.5
6  6.0
7  5.5
8  5.5
>>> pd.rolling_min(a, 3)
     0
0  NaN
1  NaN
2  5.5
3  6.0
4  6.0
5  5.5
6  5.5
7  5.0
8  4.5
strubbly

At first it seemed to me that this required a minimum of O(log(window_size)) operations per element of the big list (see my other answer). But @wim pointed me to the truly remarkable algorithm described by @adamax in this post:

Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations

Here's an implementation.

Running it on the suggested 100000 numbers with a 1000 window takes 0.6 seconds instead of the 60 seconds of the naive algorithm.

class MinMaxStack(object):

    def __init__(self):
        self.stack = []

    def push(self,val):
        if not self.stack:
            self.stack = [(val,val,val)]
        else:
            _,minimum,maximum = self.stack[-1]
            if val < minimum:
                self.stack.append((val,val,maximum))
            elif val > maximum:
                self.stack.append((val,minimum,val))
            else:
                self.stack.append((val,minimum,maximum))

    def pop(self):
        return self.stack.pop()

    def get_minimax(self):
        return self.stack[-1][1:]

    def __len__(self):
        return len(self.stack)

class RollingWindow(object):

    def __init__(self):
        self.push_stack = MinMaxStack()
        self.pop_stack = MinMaxStack()

    def push_only(self,o):
        self.push_stack.push(o)

    def push_and_pop(self,o):
        self.push_stack.push(o)
        if not self.pop_stack:
            for i in range(len(self.push_stack.stack)-1):
                self.pop_stack.push(self.push_stack.pop()[0])
            self.push_stack.pop()
        else:
            self.pop_stack.pop()

    def get_minimax(self):
        if not self.pop_stack:
            return self.push_stack.get_minimax()
        elif not self.push_stack:
            return self.pop_stack.get_minimax()
        mn1,mx1 = self.pop_stack.get_minimax()
        mn2,mx2 = self.push_stack.get_minimax()
        return min(mn1,mn2),max(mx1,mx2)



import time
import random
window = 10000
test_length = 100000
data = [random.randint(1,100) for i in range(test_length)]

s = time.time()

wr = RollingWindow()
answer1 = []
for i in range(test_length):
    if i < window:
        wr.push_only(data[i])
    else:
        wr.push_and_pop(data[i])
    answer1.append(wr.get_minimax())

print(s-time.time())

s = time.time()
answer2 = []
for i in range(test_length):
    if i+1 < window:
        current_window = i+1
    else:
        current_window = window
    answer2.append((min(data[i+1-current_window:i+1]),max(data[i+1-current_window:i+1])))

print(s-time.time())

if answer1 != answer2:
    print("Test Fail")

Some small performance improvements are possible. This version continually grows and shrinks the python list used as a stack. It is slightly faster to never shrink it and to use an end pointer, instead. But only a few percent. If you were really desperate for a few more percent you could merge the two stacks into the window class and reduce the indirection in the calls. I built an optimised version replacing the lists with collections.deque and inlining the stack code and got it down to 0.32 seconds.

If even more speed was required, this would be pretty easy to code up in C or Cython (particularly for a fixed window size), particularly if you could restrict the type of the values on the stacks.

Veeresh Aradhya
l = [5.5, 6.0, 6.0, 6.5, 6.0, 5.5, 5.5, 5.0, 4.5]

windoSize = 3

for i in range(0,len(l)-windowSize+1):

    print max(l[i:i+windoSize])

output:

6.0
6.5
6.5
6.5
6.0
5.5
5.5

This is a rolling window which can be implement in pandas as the other answer shows.

If, however, you want to implement it yourself the following code will be of assistance. This code can be optimised further and could be more pythonic but it should give a good understanding of what is happening in the algorithm.

Initially the minmum and maximum values are found for the starting window. Once this is initialised we treat the sub array as a queue and only 2 values become important, the new value being added and the old value being dropped.

If the old value is a minimum or maximum we recalculated the minimum or maximum, otherwise we check if the new value is the new maximum or minimum.

def updateMinMaxValues(minVal,maxVal,val):
    if val < minVal:
        minVal = val
    if val > maxVal:
        maxVal= val
    return minVal,maxVal

values = [5.5, 6.0, 6.0, 6.5, 6.0, 5.5, 5.5, 5.0, 4.5]
windowSize = 3
minVal,maxVal = min(values[:windowSize]),max(values[:windowSize])

print(minVal,maxVal)
for stepIndex in range(windowSize,len(values)):
    oldVal,newVal = values[stepIndex-windowSize],values[stepIndex]
    if oldVal == minVal:
        minVal = min(values[stepIndex-windowSize+1:stepIndex+1])
    if oldVal == maxVal:
        maxVal = max(values[stepIndex-(windowSize)+1:stepIndex+1])
    minVal,maxVal = updateMinMaxValues(minVal,maxVal,newVal)
    print(minVal,maxVal)

results in:

5.5 6.0
6.0 6.5
6.0 6.5
5.5 6.5
5.5 6.0
5.0 5.5
4.5 5.5
strubbly

Not sure if there is a way to efficiently exploit the slow moving structure of the number stream.

I decided the best general way to do this is with Priority Queues. I've left my description of how to do that below. It is O(log(window_size)) per new number into the window.

However, the comment by wim on the original post points out that there is an O(1) algorithm, described in this post: Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations

Simply maintaining one of these which keeps the min and max is going to be the best solution by far.

But for reference here is my attempt:

Maintain a pair of Priority Queues, one for max and one for min, and add and remove an entry from each, each time. This adds quite a bit of overhead for each new entry [ O(log(window_size)) ] but it has a nice smooth behaviour per entry and good overall efficiency.

The Python heapq module is the usual way to implement a Priority Queue in Python. However, it does not directly support removing entries, or of modifying their priority. This can be done by adding a dictionary index from number to position in the queue, with no increase of computational complexity. To remove an entry you can update its number to extremely low (or high respectively) and re-heapify so it moves to the top and can be popped off.

Here's an example, that looks OK though I haven't tested it:

http://code.activestate.com/recipes/522995-priority-dict-a-priority-queue-with-updatable-prio/

You will need to disambiguate entries with the same value in the dictionary, or to keep multiple values per key, so that you can find all the instances when the time comes to remove them.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!