问题
I am creating probability distributions for each column of my data frame by distplot from seaborn library sns.distplot(). For one plot I do
x = df['A']
sns.distplot(x);
I am trying to use the FacetGrid & Map to have all plots for each columns at once in this way. But doesn't work at all.
g = sns.FacetGrid(df, col = 'A','B','C','D','E')
g.map(sns.distplot())
回答1:
I think you need to use melt
to reshape your dataframe to long format, see this MVCE:
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')
g = sns.FacetGrid(dfm, col='columns')
g = (g.map(sns.distplot, 'value'))
Output:
回答2:
You're getting this wrong on two levels.
Python syntax.
FacetGrid(df, col = 'A','B','C','D','E')
is invalid, becausecol
gets set toA
and the remaining characters are interpreted as further arguments. But since they are not named, this is invalid python syntax.Seaborn concepts.
Seaborn expects a single column name as input for the
col
orrow
argument. This means that the dataframe needs to be in a format that has one column which determines to which column or row the respective datum belongs.You do not call the function to be used by map. The idea is of course that
map
itself calls it.
Solutions:
Loop over columns:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE")) fig, axes = plt.subplots(ncols=5) for ax, col in zip(axes, df.columns): sns.distplot(df[col], ax=ax) plt.show()
Melt dataframe
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE")) g = sns.FacetGrid(df.melt(), col="variable") g.map(sns.distplot, "value") plt.show()
回答3:
I think the easiest approach is to just loop the columns and create a plot.
import numpy as np
improt pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
for col in df.columns:
hist = df[col].hist(bins=10)
print("Plotting for column {}".format(col))
plt.show()
回答4:
You can use the following:
# listing dataframes types
list(set(df.dtypes.tolist()))
# include only float and integer
df_num = df.select_dtypes(include = ['float64', 'int64'])
# display what has been selected
df_num.head()
# plot
df_num.hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8);
来源:https://stackoverflow.com/questions/50952133/distribution-probabilities-for-each-column-data-frame-in-one-plot