List Highest Correlation Pairs from a Large Correlation Matrix in Pandas?

后端 未结 13 523
心在旅途
心在旅途 2020-12-22 17:45

How do you find the top correlations in a correlation matrix with Pandas? There are many answers on how to do this with R (Show correlations as an ordered list, not as a lar

13条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-22 18:33

    Few lines solution without redundant pairs of variables:

    corr_matrix = df.corr().abs()
    
    #the matrix is symmetric so we need to extract upper triangle matrix without diagonal (k = 1)
    
    sol = (corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
                      .stack()
                      .sort_values(ascending=False))
    
    #first element of sol series is the pair with the biggest correlation
    

    Then you can iterate through names of variables pairs (which are pandas.Series multi-indexes) and theirs values like this:

    for index, value in sol.items():
      # do some staff
    

提交回复
热议问题