Plot correlation matrix using pandas

前端 未结 12 700
渐次进展
渐次进展 2020-11-30 16:23

I have a data set with huge number of features, so analysing the correlation matrix has become very difficult. I want to plot a correlation matrix which we get using d

12条回答
  •  独厮守ぢ
    2020-11-30 16:40

    You can observe the relation between features either by drawing a heat map from seaborn or scatter matrix from pandas.

    Scatter Matrix:

    pd.scatter_matrix(dataframe, alpha = 0.3, figsize = (14,8), diagonal = 'kde');
    

    If you want to visualize each feature's skewness as well - use seaborn pairplots.

    sns.pairplot(dataframe)
    

    Sns Heatmap:

    import seaborn as sns
    
    f, ax = pl.subplots(figsize=(10, 8))
    corr = dataframe.corr()
    sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
                square=True, ax=ax)
    

    The output will be a correlation map of the features. i.e. see the below example.

    The correlation between grocery and detergents is high. Similarly:

    Pdoducts With High Correlation:
    1. Grocery and Detergents.
    Products With Medium Correlation:
    1. Milk and Grocery
    2. Milk and Detergents_Paper
    Products With Low Correlation:
    1. Milk and Deli
    2. Frozen and Fresh.
    3. Frozen and Deli.

    From Pairplots: You can observe same set of relations from pairplots or scatter matrix. But from these we can say that whether the data is normally distributed or not.

    Note: The above is same graph taken from the data, which is used to draw heatmap.

提交回复
热议问题