Matplotlib histogram not counting correctly the number of values in each bin

独自空忆成欢 提交于 2021-01-29 12:43:46

问题


I am trying to make a very simple histogram with matplotlib.pyplot.hist, and it seems not to be counting properly the number of values in each bin. Here is my code:

    import numpy as np
    import matplotlib.pyplot as plt
    plt.hist([.2,.3,.5,.6],bins=np.arange(0,1.1,.1))

I am dividing the interval [0,1] in bins of width .1, so I should get four bars of height 1. But the output figure consists of only two bars of height 2: it is counting the .3 value as part of the [.2,.3) bin and, similarly, it is counting the .6 value as part of the [.5,.6) bin. I have tried it both on Spyder and Google Colab. Anyone knows what's going on? Thanks!


回答1:


The problem is that the values fall just on the boundaries of the bins. Floating point rounding can put them in either the previous or the next bin. You need bin boundaries nicely in-between the data points. Note that matplotlib's histogram is primarily meant for continuous distributions where floating point rounding doesn't have such large effects.

Here is some code to illustrate what's happening in both situations:

import numpy as np
import matplotlib.pyplot as plt

data = [.2, .3, .5, .6]

fig, axes = plt.subplots(ncols=2, figsize=(12, 4))

for ax in axes:
    if ax == axes[0]:
        bins = np.arange(0, 1.1, .1)
        ax.set_title('data on bin boundaries')
    else:
        bins = np.arange(-0.05, 1.1, .1)
        ax.set_title('data between bin boundaries')
    values, bin_bounds, bars = ax.hist(data, bins=bins, alpha=0.3)

    ax.vlines(bin_bounds, 0, max(values), color='crimson', ls=':')
    ax.scatter(data, np.full_like(data, 0.5), color='lime', s=30)
    ax.set_ylim(0, 2.2)
    ax.set_yticks(range(3))
plt.show()




回答2:


From the docs:

If bins is a sequence, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin; in this case, bins may be unequally spaced. All but the last (righthand-most) bin is half-open. In other words, if bins is:

[1, 2, 3, 4]

then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.

Because the intervales are closed - opened, both .2 and .3 fall in the same bin, and .5 and .6 in another bin.

You should fix the bins by moving the boundaries a little to avoid the numbers falling on the edges.



来源:https://stackoverflow.com/questions/63679190/matplotlib-histogram-not-counting-correctly-the-number-of-values-in-each-bin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!