assigning points to bins

前端 未结 2 1421
梦毁少年i
梦毁少年i 2020-12-28 19:49

What is a good way to bin numerical values into a certain range? For example, suppose I have a list of values and I want to bin them into N bins by their range. Right now,

2条回答
  •  旧时难觅i
    2020-12-28 20:18

    numpy.histogram() does exactly what you want.

    The function signature is:

    numpy.histogram(a, bins=10, range=None, normed=False, weights=None, new=None)
    

    We're mostly interested in a and bins. a is the input data that needs to be binned. bins can be a number of bins (your num_bins), or it can be a sequence of scalars, which denote bin edges (half open).

    import numpy
    values = numpy.arange(10, dtype=int)
    bins = numpy.arange(-1, 11)
    freq, bins = numpy.histogram(values, bins)
    # freq is now [0 1 1 1 1 1 1 1 1 1 1]
    # bins is unchanged
    

    To quote the documentation:

    All but the last (righthand-most) bin is half-open. In other words, if bins is:

    [1, 2, 3, 4]
    

    then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.

    Edit: You want to know the index in your bins of each element. For this, you can use numpy.digitize(). If your bins are going to be integral, you can use numpy.bincount() as well.

    >>> values = numpy.random.randint(0, 20, 10)
    >>> values
    array([17, 14,  9,  7,  6,  9, 19,  4,  2, 19])
    >>> bins = numpy.linspace(-1, 21, 23)
    >>> bins
    array([ -1.,   0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,
            10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,
            21.])
    >>> pos = numpy.digitize(values, bins)
    >>> pos
    array([19, 16, 11,  9,  8, 11, 21,  6,  4, 21])
    

    Since the interval is open on the upper limit, the indices are correct:

    >>> (bins[pos-1] == values).all()
    True
    >>> import sys
    >>> for n in range(len(values)):
    ...     sys.stdout.write("%g <= %g < %g\n"
    ...             %(bins[pos[n]-1], values[n], bins[pos[n]]))
    17 <= 17 < 18
    14 <= 14 < 15
    9 <= 9 < 10
    7 <= 7 < 8
    6 <= 6 < 7
    9 <= 9 < 10
    19 <= 19 < 20
    4 <= 4 < 5
    2 <= 2 < 3
    19 <= 19 < 20
    

提交回复
热议问题