问题
Assuming I have a Pandas dataframe similar to the below, how would I get the rolling correlation (for 2 days in this example) between 2 specific columns and group by the 'ID' column? I am familiar with the Pandas rolling_corr() function but I cannot figure out how to combine that with the groupby() clause.
What I have:
ID Date Val1 Val2
A 1-Jan 45 22
A 2-Jan 15 66
A 3-Jan 55 13
B 1-Jan 41 12
B 2-Jan 87 45
B 3-Jan 82 66
C 1-Jan 33 34
C 2-Jan 15 67
C 3-Jan 46 22
What I need:
ID Date Val1 Val2 Rolling_Corr
A 1-Jan 45 22
A 2-Jan 15 66 0.1
A 3-Jan 55 13 0.16
B 1-Jan 41 12
B 2-Jan 87 45 0.15
B 3-Jan 82 66 0.05
C 1-Jan 33 34
C 2-Jan 15 67 0.09
C 3-Jan 46 22 0.11
Thanks!
回答1:
You can actually start with the simple approach here: Pandas Correlation Groupby
and then add rolling(3)
like this:
df.groupby('ID')[['Val1','Val2']].rolling(3).corr()
I've changed the window from 2 to 3 because you'll only get 1 or -1 with a window size of 2. Unfortunately, that output (not shown) is a bit verbose because it outputs a 2x2 correlation matrix when all you need is a scalar. But with an additional line you can make the output more concise:
df2 = df.groupby('ID')[['Val1','Val2']].rolling(3).corr()
df2.groupby(level=[0,1]).last()['Val1']
ID
A 0 NaN
1 NaN
2 -0.996539
B 3 NaN
4 NaN
5 0.879868
C 6 NaN
7 NaN
8 -0.985529
来源:https://stackoverflow.com/questions/28998998/rolling-correlation-with-groupby-in-pandas