List Highest Correlation Pairs from a Large Correlation Matrix in Pandas?

后端 未结 13 579
心在旅途
心在旅途 2020-12-22 17:45

How do you find the top correlations in a correlation matrix with Pandas? There are many answers on how to do this with R (Show correlations as an ordered list, not as a lar

13条回答
  •  情书的邮戳
    2020-12-22 18:24

    You can use DataFrame.values to get an numpy array of the data and then use NumPy functions such as argsort() to get the most correlated pairs.

    But if you want to do this in pandas, you can unstack and sort the DataFrame:

    import pandas as pd
    import numpy as np
    
    shape = (50, 4460)
    
    data = np.random.normal(size=shape)
    
    data[:, 1000] += data[:, 2000]
    
    df = pd.DataFrame(data)
    
    c = df.corr().abs()
    
    s = c.unstack()
    so = s.sort_values(kind="quicksort")
    
    print so[-4470:-4460]
    

    Here is the output:

    2192  1522    0.636198
    1522  2192    0.636198
    3677  2027    0.641817
    2027  3677    0.641817
    242   130     0.646760
    130   242     0.646760
    1171  2733    0.670048
    2733  1171    0.670048
    1000  2000    0.742340
    2000  1000    0.742340
    dtype: float64
    

提交回复
热议问题