how do I calculate a rolling idxmax

别来无恙 提交于 2019-12-28 06:25:11

问题


consider the pd.Series s

import pandas as pd
import numpy as np

np.random.seed([3,1415])
s = pd.Series(np.random.randint(0, 10, 10), list('abcdefghij'))
s

a    0
b    2
c    7
d    3
e    8
f    7
g    0
h    6
i    8
j    6
dtype: int64

I want to get the index for the max value for the rolling window of 3

s.rolling(3).max()

a    NaN
b    NaN
c    7.0
d    7.0
e    8.0
f    8.0
g    8.0
h    7.0
i    8.0
j    8.0
dtype: float64

What I want is

a    None
b    None
c       c
d       c
e       e
f       e
g       e
h       f
i       i
j       i
dtype: object

What I've done

s.rolling(3).apply(np.argmax)

a    NaN
b    NaN
c    2.0
d    1.0
e    2.0
f    1.0
g    0.0
h    0.0
i    2.0
j    1.0
dtype: float64

which is obviously not what I want


回答1:


There is no simple way to do that, because the argument that is passed to the rolling-applied function is a plain numpy array, not a pandas Series, so it doesn't know about the index. Moreover, the rolling functions must return a float result, so they can't directly return the index values if they're not floats.

Here is one approach:

>>> s.index[s.rolling(3).apply(np.argmax)[2:].astype(int)+np.arange(len(s)-2)]
Index([u'c', u'c', u'e', u'e', u'e', u'f', u'i', u'i'], dtype='object')

The idea is to take the argmax values and align them with the series by adding a value indicating how far along in the series we are. (That is, for the first argmax value we add zero, because it is giving us the index into a subsequence starting at index 0 in the original series; for the second argmax value we add one, because it is giving us the index into a subsequence starting at index 1 in the original series; etc.)

This gives the correct results, but doesn't include the two "None" values at the beginning; you'd have to add those back manually if you wanted them.

There is an open pandas issue to add rolling idxmax.




回答2:


Here's an approach using broadcasting -

maxidx = (s.values[np.arange(s.size-3+1)[:,None] + np.arange(3)]).argmax(1)
out = s.index[maxidx+np.arange(maxidx.size)]

This generates all the indices corresponding to the rolling windows, indexes into the extracted array version with those and thus gets the max indices for each window. For a more efficient indexing, we can use NumPy strides, like so -

arr = s.values
n = arr.strides[0]
maxidx = np.lib.stride_tricks.as_strided(arr, \
                   shape=(s.size-3+1,3), strides=(n,n)).argmax(1)



回答3:


I used a generator

def idxmax(s, w):
    i = 0
    while i + w <= len(s):
        yield(s.iloc[i:i+w].idxmax())
        i += 1

pd.Series(idxmax(s, 3), s.index[2:])

c    c
d    c
e    e
f    e
g    e
h    f
i    i
j    i
dtype: object



回答4:


You can also simulate the rolling window by creating a DataFrame and use idxmax as follows:

window_values = pd.DataFrame({0: s, 1: s.shift(), 2: s.shift(2)})
s.index[np.arange(len(s)) - window_values.idxmax(1)]

Index(['a', 'b', 'c', 'c', 'e', 'e', 'e', 'f', 'i', 'i'], dtype='object', name=0)

As you can see, the first two terms are the idxmax as applied to the initial windows of lengths 1 and 2 rather than null values. It's not as efficient as the accepted answer and probably not a good idea for large windows but just another perspective.




回答5:


Just chiming in on how I solved a similar problem that I had. I did not want to find the index exactly, I wanted to find how long ago the max value happened. But this could be used to find the index as well.

I'm basically using the shift strategy, but I'm iterating over several shifts with a configurable length. It's probably slow, but it works good enough for me.

import pandas as pd


length = 5

data = [1, 2, 3, 4, 5, 4, 3, 4, 5, 6, 7, 6, 5, 4, 5, 4, 3]
df = pd.DataFrame(data, columns=['number'])
df['helper_max'] = df.rolling(length).max()

for i in range(length, -1, -1):
    # Set the column to what you want. You may grab the index 
    # if you wish, I wanted number of rows since max happened
    df.loc[df['number'].shift(i) == df['helper_max'], 'n_rows_ago_since_max'] = i

print(df)

Output:

    number  helper_max  n_rows_ago_since_max
0        1         NaN                   NaN
1        2         NaN                   NaN
2        3         NaN                   NaN
3        4         NaN                   NaN
4        5         5.0                   0.0
5        4         5.0                   1.0
6        3         5.0                   2.0
7        4         5.0                   3.0
8        5         5.0                   0.0
9        6         6.0                   0.0
10       7         7.0                   0.0
11       6         7.0                   1.0
12       5         7.0                   2.0
13       4         7.0                   3.0
14       5         7.0                   4.0
15       4         6.0                   4.0
16       3         5.0                   2.0


来源:https://stackoverflow.com/questions/40101130/how-do-i-calculate-a-rolling-idxmax

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!