consider the pd.Series
s
import pandas as pd
import numpy as np
np.random.seed([3,1415])
s = pd.Series(np.random.randint(0, 10, 10), list('abcdefghij'))
s
a 0
b 2
c 7
d 3
e 8
f 7
g 0
h 6
i 8
j 6
dtype: int64
I want to get the index for the max value for the rolling window of 3
s.rolling(3).max()
a NaN
b NaN
c 7.0
d 7.0
e 8.0
f 8.0
g 8.0
h 7.0
i 8.0
j 8.0
dtype: float64
What I want is
a None
b None
c c
d c
e e
f e
g e
h f
i i
j i
dtype: object
What I've done
s.rolling(3).apply(np.argmax)
a NaN
b NaN
c 2.0
d 1.0
e 2.0
f 1.0
g 0.0
h 0.0
i 2.0
j 1.0
dtype: float64
which is obviously not what I want
There is no simple way to do that, because the argument that is passed to the rolling-applied function is a plain numpy array, not a pandas Series, so it doesn't know about the index. Moreover, the rolling functions must return a float result, so they can't directly return the index values if they're not floats.
Here is one approach:
>>> s.index[s.rolling(3).apply(np.argmax)[2:].astype(int)+np.arange(len(s)-2)]
Index([u'c', u'c', u'e', u'e', u'e', u'f', u'i', u'i'], dtype='object')
The idea is to take the argmax values and align them with the series by adding a value indicating how far along in the series we are. (That is, for the first argmax value we add zero, because it is giving us the index into a subsequence starting at index 0 in the original series; for the second argmax value we add one, because it is giving us the index into a subsequence starting at index 1 in the original series; etc.)
This gives the correct results, but doesn't include the two "None" values at the beginning; you'd have to add those back manually if you wanted them.
There is an open pandas issue to add rolling idxmax.
Here's an approach using broadcasting
-
maxidx = (s.values[np.arange(s.size-3+1)[:,None] + np.arange(3)]).argmax(1)
out = s.index[maxidx+np.arange(maxidx.size)]
This generates all the indices corresponding to the rolling windows, indexes into the extracted array version with those and thus gets the max indices for each window. For a more efficient indexing, we can use NumPy strides
, like so -
arr = s.values
n = arr.strides[0]
maxidx = np.lib.stride_tricks.as_strided(arr, \
shape=(s.size-3+1,3), strides=(n,n)).argmax(1)
I used a generator
def idxmax(s, w):
i = 0
while i + w <= len(s):
yield(s.iloc[i:i+w].idxmax())
i += 1
pd.Series(idxmax(s, 3), s.index[2:])
c c
d c
e e
f e
g e
h f
i i
j i
dtype: object
You can also simulate the rolling window by creating a DataFrame
and use idxmax
as follows:
window_values = pd.DataFrame({0: s, 1: s.shift(), 2: s.shift(2)})
s.index[np.arange(len(s)) - window_values.idxmax(1)]
Index(['a', 'b', 'c', 'c', 'e', 'e', 'e', 'f', 'i', 'i'], dtype='object', name=0)
As you can see, the first two terms are the idxmax
as applied to the initial windows of lengths 1 and 2 rather than null values.
It's not as efficient as the accepted answer and probably not a good idea for large windows but just another perspective.
来源:https://stackoverflow.com/questions/40101130/how-do-i-calculate-a-rolling-idxmax