Weird behavior of matplotlibs boxplot when using the notch shape

筅森魡賤 提交于 2020-01-23 06:03:56

问题


I am encountering some weird behavior in matplotlib's boxplot function when I am using the "notch" shape. I am using some code that I have written a while ago and never had those issues -- I am wondering what the problem is. Any ideas?

When I turn the notch shape off it looks normal though

This would be the code:

def boxplot_modified(data):

    fig = plt.figure(figsize=(8,6))
    ax = plt.subplot(111) 

    bplot = plt.boxplot(data, 
            #notch=True,          # notch shape 
            vert=True,           # vertical box aligmnent
            sym='ko',            # red circle for outliers
            patch_artist=True,   # fill with color
            )   

    # choosing custom colors to fill the boxes
    colors = 3*['lightgreen'] + 3*['lightblue'], 'lightblue', 'lightblue', 'lightblue']
    for patch, color in zip(bplot['boxes'], colors):
        patch.set_facecolor(color)

    # modifying the whiskers: straight lines, black, wider
    for whisker in bplot['whiskers']:
        whisker.set(color='black', linewidth=1.2, linestyle='-')    

    # making the caps a little bit wider 
    for cap in bplot['caps']:
        cap.set(linewidth=1.2)

    # hiding axis ticks
    plt.tick_params(axis="both", which="both", bottom="off", top="off",  
            labelbottom="on", left="off", right="off", labelleft="on")

    # adding horizontal grid lines 
    ax.yaxis.grid(True) 

    # remove axis spines
    ax.spines["top"].set_visible(False)  
    ax.spines["right"].set_visible(False) 
    ax.spines["bottom"].set_visible(True) 
    ax.spines["left"].set_visible(True)

    plt.xticks([y+1 for y in range(len(data))], 8*['x'])

    # raised title
    #plt.text(2, 1, 'Modified',
    #     horizontalalignment='center',
    #     fontsize=18)

    plt.tight_layout()
    plt.show()

boxplot_modified(df.values)

and when I make a plain plot without the customization, the problem still occurs:

def boxplot(data):

    fig = plt.figure(figsize=(8,6))
    ax = plt.subplot(111) 

    bplot = plt.boxplot(data, 
            notch=True,          # notch shape 
            vert=True,           # vertical box aligmnent
            sym='ko',            # red circle for outliers
            patch_artist=True,   # fill with color
            )   

    plt.show()
boxplot(df.values)


回答1:


Okay, as it turns out, this is actually a correct behavior ;)

From Wikipedia:

Notched box plots apply a "notch" or narrowing of the box around the median. Notches are useful in offering a rough guide to significance of difference of medians; if the notches of two boxes do not overlap, this offers evidence of a statistically significant difference between the medians. The width of the notches is proportional to the interquartile range of the sample and inversely proportional to the square root of the size of the sample. However, there is uncertainty about the most appropriate multiplier (as this may vary depending on the similarity of the variances of the samples). One convention is to use +/-1.58*IQR/sqrt(n).

This was also discussed in an issue on GitHub; R produces a similar output as evidence that this behaviour is "correct."

Thus, if we have this weird "flipped" appearance in the notched box plots, it simply means that the 1st quartile has a lower value than the confidence of the mean and vice versa for the 3rd quartile. Although it looks ugly, it's actually useful information about the (un)confidence of the median.

A bootstrapping (random sampling with replacement to estimate parameters of a sampling distribution, here: confidence intervals) might reduce this effect:

From the plt.boxplot documentation:

bootstrap : None (default) or integer Specifies whether to bootstrap the confidence intervals around the median for notched boxplots. If bootstrap==None, no bootstrapping is performed, and notches are calculated using a Gaussian-based asymptotic approximation (see McGill, R., Tukey, J.W., and Larsen, W.A., 1978, and Kendall and Stuart, 1967). Otherwise, bootstrap specifies the number of times to bootstrap the median to determine it's 95% confidence intervals. Values between 1000 and 10000 are recommended.



来源:https://stackoverflow.com/questions/26291082/weird-behavior-of-matplotlibs-boxplot-when-using-the-notch-shape

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!