Pandas histogram df.hist() group by

问题

How to plot a histogram with pandas DataFrame.hist() using group by? I have a data frame with 5 columns: "A", "B", "C", "D" and "Group"

There are two Groups classes: "yes" and "no"

Using:

df.hist()

I get the hist for each of the 4 columns.

Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no").

I tried this withouth success:

df.hist(by = "group")

回答1:

This is not the most flexible workaround but will work for your question specifically.

def sephist(col):
    yes = df[df['group'] == 'yes'][col]
    no = df[df['group'] == 'no'][col]
    return yes, no

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
    plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

You could make this more generic by:

adding a df and by parameter to sephist: def sephist(df, by, col)
making the subplots loop more flexible: for num, alpha in enumerate(df.columns)

Because the first argument to matplotlib.pyplot.hist can take

either a single array or a sequency of arrays which are not required to be of the same length

...an alternattive would be:

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

回答2:

Using Seaborn

If you are open to use Seaborn, a plot with multiple subplots and multiple variables within each subplot can easily be made using seaborn.FacetGrid.

import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)

df2 = pd.melt(df, id_vars='group', value_vars=list("ABCD"), value_name='value')

bins=np.linspace(df2.value.min(), df2.value.max(), 10)
g = sns.FacetGrid(df2, col="variable", hue="group", palette="Set1", col_wrap=2)
g.map(plt.hist, 'value', bins=bins, ec="k")

g.axes[-1].legend()
plt.show()

来源：https://stackoverflow.com/questions/45883598/pandas-histogram-df-hist-group-by

标签

pandas

matplotlib

histogram