Calculate correlation between all columns of a DataFrame and all columns of another DataFrame?

前端 未结 4 1821
你的背包
你的背包 2020-12-15 09:53

I have a DataFrame object stocks filled with stock returns. I have another DataFrame object industries filled with industry returns. I want to find

相关标签:
4条回答
  • 2020-12-15 09:59

    And here's a one-liner that uses apply on the columns and avoids the nested for loops. The main benefit is that apply builds the result in a DataFrame.

    df1.apply(lambda s: df2.corrwith(s))
    
    0 讨论(0)
  • 2020-12-15 10:02

    Here's a slightly simpler answer than JohnE's that uses pandas natively instead of using numpy.corrcoef. As an added bonus, you don't have to retrieve the correlation value out of a silly 2x2 correlation matrix, because pandas's series-to-series correlation function simply returns a number, not a matrix.

    In [133]: for s in ['s1','s2']:
         ...:     for i in ['i1','i2']:
         ...:         print df1[s].corr(df2[i])
    
    0 讨论(0)
  • 2020-12-15 10:15

    (Edit to add: Instead of this answer please check out @yt's answer which was added later but is clearly better.)

    You could go with numpy.corrcoef() which is basically the same as corr in pandas, but the syntax may be more amenable to what you want.

    for s in ['s1','s2']:
        for i in ['i1','i2']:
            print( 'corrcoef',s,i,np.corrcoef(df1[s],df2[i])[0,1] )
    

    That prints:

    corrcoef s1 i1 -0.00416977553597
    corrcoef s1 i2 -0.0096393047035
    corrcoef s2 i1 -0.026278689352
    corrcoef s2 i2 -0.00402030582064
    

    Alternatively you could load the results into a dataframe with appropriate labels:

    cc = pd.DataFrame()     
    for s in ['s1','s2']:
        for i in ['i1','i2']:
            cc = cc.append( pd.DataFrame(
                 { 'corrcoef':np.corrcoef(df1[s],df2[i])[0,1] }, index=[s+'_'+i]))
    

    Which looks like this:

           corrcoef
    s1_i1 -0.004170
    s1_i2 -0.009639
    s2_i1 -0.026279
    s2_i2 -0.004020
    
    0 讨论(0)
  • 2020-12-15 10:19

    Quite late, but more general solution:

    def corrmatrix(df1,df2):
        s = df1.values.shape[1]
        cr = np.corrcoef(df1.values.T,df2.values.T)[s:,:s]    
        return pd.DataFrame(cr,index = df2.columns,columns = df1.columns)
    
    0 讨论(0)
提交回复
热议问题