Boxplots in matplotlib: Markers and outliers

后端 未结 6 915
情深已故
情深已故 2020-12-07 08:52

I have some questions about boxplots in matplotlib:

Question A. What do the markers that I highlighted below with Q1, Q2

6条回答
  •  北海茫月
    2020-12-07 09:43

    Here's a graphic that illustrates the components of the box from a stats.stackexchange answer. Note that k=1.5 if you don't supply the whis keyword in Pandas.

    The boxplot function in Pandas is a wrapper for matplotlib.pyplot.boxplot. The matplotlib docs explain the components of the boxes in detail:

    Question A:

    The box extends from the lower to upper quartile values of the data, with a line at the median.

    i.e. a quarter of the input data values is below the box, a quarter of the data lies in each part of the box, and the remaining quarter lies above the box.

    Question B:

    whis : float, sequence, or string (default = 1.5)

    As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whis*IQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whis*IQR. Beyond the whiskers, data are considered outliers and are plotted as individual points.

    Matplotlib (and Pandas) also gives you a lot of options to change this default definition of the whiskers:

    Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.

提交回复
热议问题