I\'m looking for a way to do something like the various rolling_* functions of pandas, but I want the window of the rolling computation to be defin
Based on BrenBarns's answer, but speeded up by using label based indexing rather than boolean based indexing:
def rollBy(what,basis,window,func,*args,**kwargs):
#note that basis must be sorted in order for this to work properly
indexed_what = pd.Series(what.values,index=basis.values)
def applyToWindow(val):
# using slice_indexer rather that what.loc [val:val+window] allows
# window limits that are not specifically in the index
indexer = indexed_what.index.slice_indexer(val,val+window,1)
chunk = indexed_what[indexer]
return func(chunk,*args,**kwargs)
rolled = basis.apply(applyToWindow)
return rolled
This is much faster than not using an indexed column:
In [46]: df = pd.DataFrame({"RollBasis":np.random.uniform(0,1000000,100000), "ToRoll": np.random.uniform(0,10,100000)})
In [47]: df = df.sort("RollBasis")
In [48]: timeit("rollBy_Ian(df.ToRoll,df.RollBasis,10,sum)",setup="from __main__ import rollBy_Ian,df", number =3)
Out[48]: 67.6615059375763
In [49]: timeit("rollBy_Bren(df.ToRoll,df.RollBasis,10,sum)",setup="from __main__ import rollBy_Bren,df", number =3)
Out[49]: 515.0221037864685
Its worth noting that the index based solution is O(n), while the logical slicing version is O(n^2) in the average case (I think).
I find it more useful to do this over evenly spaced windows from the min value of Basis to the max value of Basis, rather than at every value of basis. This means altering the function thus:
def rollBy(what,basis,window,func,*args,**kwargs):
#note that basis must be sorted in order for this to work properly
windows_min = basis.min()
windows_max = basis.max()
window_starts = np.arange(windows_min, windows_max, window)
window_starts = pd.Series(window_starts, index = window_starts)
indexed_what = pd.Series(what.values,index=basis.values)
def applyToWindow(val):
# using slice_indexer rather that what.loc [val:val+window] allows
# window limits that are not specifically in the index
indexer = indexed_what.index.slice_indexer(val,val+window,1)
chunk = indexed_what[indexer]
return func(chunk,*args,**kwargs)
rolled = window_starts.apply(applyToWindow)
return rolled