List Highest Correlation Pairs from a Large Correlation Matrix in Pandas?

后端 未结 13 535
心在旅途
心在旅途 2020-12-22 17:45

How do you find the top correlations in a correlation matrix with Pandas? There are many answers on how to do this with R (Show correlations as an ordered list, not as a lar

13条回答
  •  萌比男神i
    2020-12-22 18:24

    I didn't want to unstack or over-complicate this issue, since I just wanted to drop some highly correlated features as part of a feature selection phase.

    So I ended up with the following simplified solution:

    # map features to their absolute correlation values
    corr = features.corr().abs()
    
    # set equality (self correlation) as zero
    corr[corr == 1] = 0
    
    # of each feature, find the max correlation
    # and sort the resulting array in ascending order
    corr_cols = corr.max().sort_values(ascending=False)
    
    # display the highly correlated features
    display(corr_cols[corr_cols > 0.8])
    

    In this case, if you want to drop correlated features, you may map through the filtered corr_cols array and remove the odd-indexed (or even-indexed) ones.

提交回复
热议问题