问题
I have a csv file like this:
date,sym,close
2014.01.01,A,10
2014.01.02,A,11
2014.01.03,A,12
2014.01.04,A,13
2014.01.01,B,20
2014.01.02,B,22
2014.01.03,B,23
2014.01.01,C,33
2014.01.02,C,32
2014.01.03,C,31
Then, I get a dateframe named df
via read_csv
function
import numpy as np
import pandas as pd
df=pd.read_csv('daily.csv',index_col=[0])
groups=df.groupby('sym')[['close']].apply(lambda x:func(x['close'].values))
The groups
look like this:
sym
A [nan,1.00,2.00,...]
B [nan,1.00,2.00,...]
C [nan,1.00,2.00,...]
How to calculate the correlation between each pair of sym?
AA,AB,AC,BB,BA,BC,CA,CB,CC
BTW, the item numbers of each sym may be NOT the same.
回答1:
With df
as above, make a pivot table:
dfp = df.pivot('date','sym')
print(dfp)
close sym A B C date 2014-01-01 10 20 33 2014-01-02 11 22 32 2014-01-03 12 23 31 2014-01-04 13 NaN 30
pandas will calculate the pairwise coefficients:
print(dfp.corr())
close sym A B C sym close A 1.000000 0.981981 -1.000000 B 0.981981 1.000000 -0.981981 C -1.000000 -0.981981 1.000000
But if you want to prettify it, check out seaborn
:
import seaborn as sns
sns.corrplot(dfp, annot=True)
result:

回答2:
After get groups
:
sym
A [nan,1.00,2.00,...]
B [nan,1.00,2.00,...]
C [nan,1.00,2.00,...]
I created a DataFrame df2
df2=DataFrame()
df2['A']=groups['A']
df2['B']=groups['B']
df2['C']=groups['C']
df2.corr()
This method can get the correlation via data of groups. However, not perfect. How to convert a groups to a DataFrame like this? Loop keys of groups? I need to continue to try.
来源:https://stackoverflow.com/questions/29631240/pandas-correlation-matrix-between-each-pair-groupby-item