How to put multiple median values in the boxplot?

家住魔仙堡 提交于 2020-01-24 19:39:25

问题


I only found the code can put median in boxplot and I tried it. But since my boxplot is multiple, so it unable to get the x-tick get locator. How can I find the minor tick locator of the boxplot, I already tried it yet still cannot get the location of multiple boxplot location. Any suggestion to improve this plot?

df = pd.DataFrame([['Apple', 10, 'A'],['Apple', 8, 'B'],['Apple', 10, 'C'],
              ['Apple', 5, 'A'],['Apple', 7, 'B'],['Apple', 9, 'C'],
              ['Apple', 3, 'A'],['Apple', 5, 'B'],['Apple', 4, 'C'],
              ['Orange', 3, 'A'],['Orange', 4, 'B'],['Orange', 6, 'C'],
              ['Orange', 2, 'A'],['Orange', 8, 'B'],['Orange', 4, 'C'],
              ['Orange', 8, 'A'],['Orange', 10, 'B'],['Orange', 1, 'C']])

df.columns = ['item', 'score', 'grade']


fig = plt.figure(figsize=(6, 3), dpi=150)

ax = sns.boxplot(x='item', y='score', data=df, hue='grade', palette=sns.color_palette('husl'))
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')

medians = df.groupby(['item','grade'])['score'].median().values
median_labels = [str(np.round(s, 2)) for s in medians]

pos = range(len(medians))
for tick,label in zip(pos, ax.get_xticklabels()):
    ax.text(pos[tick], medians[tick], median_labels[tick], 
            horizontalalignment='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))


回答1:


Seaborn is notoriously difficult to work with. The code below works but might break if one of the category is empty and no boxplot is drawn for example, use at your own risks:

df = pd.DataFrame([['Apple', 10, 'A'],['Apple', 8, 'B'],['Apple', 10, 'C'],
              ['Apple', 5, 'A'],['Apple', 7, 'B'],['Apple', 9, 'C'],
              ['Apple', 3, 'A'],['Apple', 5, 'B'],['Apple', 4, 'C'],
              ['Orange', 3, 'A'],['Orange', 4, 'B'],['Orange', 6, 'C'],
              ['Orange', 2, 'A'],['Orange', 8, 'B'],['Orange', 4, 'C'],
              ['Orange', 8, 'A'],['Orange', 10, 'B'],['Orange', 1, 'C']])

df.columns = ['item', 'score', 'grade']


width = 0.8
hue_col = 'grade'

fig, plt.figure(figsize=(6, 3), dpi=150)
ax = sns.boxplot(x='item', y='score', data=df, hue=hue_col, palette=sns.color_palette('husl'), width=width)
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')

# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()

medians = df.groupby(['item','grade'])['score'].median()

for x0,(_,med0) in enumerate(medians.groupby(level=0)):
    for off,(_,med1) in zip(offsets,med0.groupby(level=1)):
        ax.text(x0+off, med1.item(), '{:.0f}'.format(med1.item()), 
            horizontalalignment='center', va='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))

In general, to avoid any surpises, if you want to modify a seaborn plot, I would recommend you specify order and hue_order so that the plot is drawn in a pre-determined order. Here is an other version that is able to deal with a missing category:

df = pd.DataFrame([['Apple', 8, 'B'],['Apple', 10, 'C'],
              ['Apple', 7, 'B'],['Apple', 9, 'C'],
              ['Apple', 5, 'B'],['Apple', 4, 'C'],
              ['Orange', 3, 'A'],['Orange', 6, 'C'],
              ['Orange', 2, 'A'],['Orange', 4, 'C'],
              ['Orange', 8, 'A'],['Orange', 1, 'C']])

df.columns = ['item', 'score', 'grade']


order = ['Apple', 'Orange']
hue_col = 'grade'
hue_order = ['A','B','C']
width = 0.8

fig, plt.figure(figsize=(6, 3), dpi=150)
ax = sns.boxplot(x='item', y='score', data=df, hue=hue_col, palette=sns.color_palette('husl'), width=width,
                order=order, hue_order=hue_order)
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')

# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(df[hue_col].unique())
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()

medians = df.groupby(['item','grade'])['score'].median()
medians = medians.reindex(pd.MultiIndex.from_product([order,hue_order]))

for x0,(_,med0) in enumerate(medians.groupby(level=0)):
    for off,(_,med1) in zip(offsets,med0.groupby(level=1)):
        if not np.isnan(med1.item()):
            ax.text(x0+off, med1.item(), '{:.0f}'.format(med1.item()), 
                horizontalalignment='center', va='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))



来源:https://stackoverflow.com/questions/59097497/how-to-put-multiple-median-values-in-the-boxplot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!