I have a data set with huge number of features, so analysing the correlation matrix has become very difficult. I want to plot a correlation matrix which we get using d
You can observe the relation between features either by drawing a heat map from seaborn or scatter matrix from pandas.
Scatter Matrix:
pd.scatter_matrix(dataframe, alpha = 0.3, figsize = (14,8), diagonal = 'kde');
If you want to visualize each feature's skewness as well - use seaborn pairplots.
sns.pairplot(dataframe)
Sns Heatmap:
import seaborn as sns
f, ax = pl.subplots(figsize=(10, 8))
corr = dataframe.corr()
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
square=True, ax=ax)
The output will be a correlation map of the features. i.e. see the below example.
The correlation between grocery and detergents is high. Similarly:
Pdoducts With High Correlation:From Pairplots: You can observe same set of relations from pairplots or scatter matrix. But from these we can say that whether the data is normally distributed or not.
Note: The above is same graph taken from the data, which is used to draw heatmap.