Python 2.7 and Pandas Boxplot connecting median values

陌路散爱 提交于 2020-08-02 10:55:26

问题


It seems like plotting a line connecting the mean values of box plots would be a simple thing to do, but I couldn't figure out how to do this plot in pandas.

I'm using this syntax to do the boxplot so that it automatically generate the box plot for Y vs. X device without having to do external manipulation of the data frame:

df.boxplot(column='Y_Data', by="Category", showfliers=True, showmeans=True)

One way I thought of doing is to just do a line plot by getting the mean values from the boxplot, but I'm not sure how to extract that information from the plot.


回答1:


You can save the axis object that gets returned from df.boxplot(), and plot the means as a line plot using that same axis. I'd suggest using Seaborn's pointplot for the lines, as it handles a categorical x-axis nicely.

First let's generate some sample data:

import pandas as pd
import numpy as np
import seaborn as sns

N = 150
values = np.random.random(size=N)
groups = np.random.choice(['A','B','C'], size=N)
df = pd.DataFrame({'value':values, 'group':groups})

print(df.head())
  group     value
0     A  0.816847
1     A  0.468465
2     C  0.871975
3     B  0.933708
4     A  0.480170
              ...

Next, make the boxplot and save the axis object:

ax = df.boxplot(column='value', by='group', showfliers=True, 
                positions=range(df.group.unique().shape[0]))

Note: There's a curious positions argument in Pyplot/Pandas boxplot(), which can cause off-by-one errors. See more in this discussion, including the workaround I've employed here.

Finally, use groupby to get category means, and then connect mean values with a line plot overlaid on top of the boxplot:

sns.pointplot(x='group', y='value', data=df.groupby('group', as_index=False).mean(), ax=ax)

Your title mentions "median" but you talk about category means in your post. I used means here; change the groupby aggregation to median() if you want to plot medians instead.



来源:https://stackoverflow.com/questions/44039146/python-2-7-and-pandas-boxplot-connecting-median-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!