How can we show ONLY features that are correlated over a certain threshold in a heatmap?

做~自己de王妃 提交于 2021-02-08 11:47:23

问题


I've got too many features in a data frame. I'm trying to plot ONLY the features which are correlated over a certain threshold, let's say over 80%, and show those in a heatmap. I put some code together, and it runs, but I still see some white lines, which have no data, and thus no correlation. Also, I'm seeing things that are well under 80% correlation. Here is the code that I tried.

import seaborn
c = newdf.corr()
plt.figure(figsize=(10,10))
seaborn.heatmap(c, cmap='RdYlGn_r', mask = (np.abs(c) >= 0.8))
plt.show()

When I run that, I see this.

What is wrong here?

I am making a small update, with some new findings.

This gets ONLY corr>.8.

corr = newdf.corr()
kot = corr[corr>=.8]
plt.figure(figsize=(12,8))
sns.heatmap(kot, cmap="Reds")

That seems to work, but it still gives me a lot of white! I thought there should be a way to include only the items that have a correlation over a certain amount. Maybe you have to copy those items with >.8 items to a new data frame and build the correlation off of that object. I'm not sure how this works.


回答1:


The following code groups the strongly correlated features (with correlation above 0.8 in magnitude) into components and plots the correlation for each group of components individually. Please let me know if it differs from what you want.

components = list()
visited = set()
print(newdf.columns)
for col in newdf.columns:
    if col in visited:
        continue

    component = set([col, ])
    just_visited = [col, ]
    visited.add(col)
    while just_visited:
        c = just_visited.pop(0)
        for idx, val in corr[c].items():
            if abs(val) > 0.999 and idx not in visited:
                just_visited.append(idx)
                visited.add(idx)
                component.add(idx)
    components.append(component)

for component in components:
    plt.figure(figsize=(12,8))
    sns.heatmap(corr.loc[component, component], cmap="Reds")


来源:https://stackoverflow.com/questions/64019509/how-can-we-show-only-features-that-are-correlated-over-a-certain-threshold-in-a

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!