问题
I am encountering some weird behavior in matplotlib
's boxplot
function when I am using the "notch
" shape. I am using some code that I have written a while ago and never had those issues -- I am wondering what the problem is. Any ideas?

When I turn the notch shape off it looks normal though

This would be the code:
def boxplot_modified(data):
fig = plt.figure(figsize=(8,6))
ax = plt.subplot(111)
bplot = plt.boxplot(data,
#notch=True, # notch shape
vert=True, # vertical box aligmnent
sym='ko', # red circle for outliers
patch_artist=True, # fill with color
)
# choosing custom colors to fill the boxes
colors = 3*['lightgreen'] + 3*['lightblue'], 'lightblue', 'lightblue', 'lightblue']
for patch, color in zip(bplot['boxes'], colors):
patch.set_facecolor(color)
# modifying the whiskers: straight lines, black, wider
for whisker in bplot['whiskers']:
whisker.set(color='black', linewidth=1.2, linestyle='-')
# making the caps a little bit wider
for cap in bplot['caps']:
cap.set(linewidth=1.2)
# hiding axis ticks
plt.tick_params(axis="both", which="both", bottom="off", top="off",
labelbottom="on", left="off", right="off", labelleft="on")
# adding horizontal grid lines
ax.yaxis.grid(True)
# remove axis spines
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["bottom"].set_visible(True)
ax.spines["left"].set_visible(True)
plt.xticks([y+1 for y in range(len(data))], 8*['x'])
# raised title
#plt.text(2, 1, 'Modified',
# horizontalalignment='center',
# fontsize=18)
plt.tight_layout()
plt.show()
boxplot_modified(df.values)
and when I make a plain plot without the customization, the problem still occurs:
def boxplot(data):
fig = plt.figure(figsize=(8,6))
ax = plt.subplot(111)
bplot = plt.boxplot(data,
notch=True, # notch shape
vert=True, # vertical box aligmnent
sym='ko', # red circle for outliers
patch_artist=True, # fill with color
)
plt.show()
boxplot(df.values)

回答1:
Okay, as it turns out, this is actually a correct behavior ;)
From Wikipedia:
Notched box plots apply a "notch" or narrowing of the box around the median. Notches are useful in offering a rough guide to significance of difference of medians; if the notches of two boxes do not overlap, this offers evidence of a statistically significant difference between the medians. The width of the notches is proportional to the interquartile range of the sample and inversely proportional to the square root of the size of the sample. However, there is uncertainty about the most appropriate multiplier (as this may vary depending on the similarity of the variances of the samples). One convention is to use +/-1.58*IQR/sqrt(n).
This was also discussed in an issue on GitHub; R produces a similar output as evidence that this behaviour is "correct."
Thus, if we have this weird "flipped" appearance in the notched box plots, it simply means that the 1st quartile has a lower value than the confidence of the mean and vice versa for the 3rd quartile. Although it looks ugly, it's actually useful information about the (un)confidence of the median.
A bootstrapping (random sampling with replacement to estimate parameters of a sampling distribution, here: confidence intervals) might reduce this effect:
From the plt.boxplot
documentation:
bootstrap : None (default) or integer Specifies whether to bootstrap the confidence intervals around the median for notched boxplots. If bootstrap==None, no bootstrapping is performed, and notches are calculated using a Gaussian-based asymptotic approximation (see McGill, R., Tukey, J.W., and Larsen, W.A., 1978, and Kendall and Stuart, 1967). Otherwise, bootstrap specifies the number of times to bootstrap the median to determine it's 95% confidence intervals. Values between 1000 and 10000 are recommended.
来源:https://stackoverflow.com/questions/26291082/weird-behavior-of-matplotlibs-boxplot-when-using-the-notch-shape