问题
It seems like plotting a line connecting the mean values of box plots would be a simple thing to do, but I couldn't figure out how to do this plot in pandas.
I'm using this syntax to do the boxplot so that it automatically generate the box plot for Y vs. X device without having to do external manipulation of the data frame:
df.boxplot(column='Y_Data', by="Category", showfliers=True, showmeans=True)
One way I thought of doing is to just do a line plot by getting the mean values from the boxplot, but I'm not sure how to extract that information from the plot.
回答1:
You can save the axis object that gets returned from df.boxplot()
, and plot the means as a line plot using that same axis. I'd suggest using Seaborn's pointplot for the lines, as it handles a categorical x-axis nicely.
First let's generate some sample data:
import pandas as pd
import numpy as np
import seaborn as sns
N = 150
values = np.random.random(size=N)
groups = np.random.choice(['A','B','C'], size=N)
df = pd.DataFrame({'value':values, 'group':groups})
print(df.head())
group value
0 A 0.816847
1 A 0.468465
2 C 0.871975
3 B 0.933708
4 A 0.480170
...
Next, make the boxplot and save the axis object:
ax = df.boxplot(column='value', by='group', showfliers=True,
positions=range(df.group.unique().shape[0]))
Note: There's a curious positions
argument in Pyplot/Pandas boxplot()
, which can cause off-by-one errors. See more in this discussion, including the workaround I've employed here.
Finally, use groupby
to get category means, and then connect mean values with a line plot overlaid on top of the boxplot:
sns.pointplot(x='group', y='value', data=df.groupby('group', as_index=False).mean(), ax=ax)
Your title mentions "median" but you talk about category means in your post. I used means here; change the groupby
aggregation to median()
if you want to plot medians instead.
来源:https://stackoverflow.com/questions/44039146/python-2-7-and-pandas-boxplot-connecting-median-values