Python: Frequency of occurrences

前端 未结 3 2105
难免孤独
难免孤独 2021-01-01 18:05

I have list of integers and want to get frequency of each integer. This was discussed here

The problem is that approach I\'m using gives me frequency of floating num

相关标签:
3条回答
  • 2021-01-01 18:32

    (Late to the party, just thought I would add a seaborn implementation)

    Seaborn Implementation of the above question:

    seaborn.__version__ = 0.9.0 at time of writing.

    Load the libraries and setup mock data.

    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    data = np.array([3]*10 + [5]*20 + [7]*5 + [9]*27 + [11]*2)
    

    Plot the data using seaborn.distplot:

    Using specified bins, calculated as per the above question.

    sns.distplot(data,bins=np.arange(data.min(), data.max()+1),kde=False,hist_kws={"align" : "left"})
    plt.show()
    

    Trying numpy built-in binning methods

    I used the doane binning method below, which produced integer bins, migth be worth trying out the standard binning methods from numpy.histogram_bin_edges as this is how matplotlib.hist() bins the data.

    sns.distplot(data,bins="doane",kde=False,hist_kws={"align" : "left"})
    plt.show()
    

    Produces the below Histogram:

    0 讨论(0)
  • 2021-01-01 18:41

    You can use groupby from itertools as shown in How to count the frequency of the elements in a list?

    import numpy as np
    from itertools import groupby
    freq = {key:len(list(group)) for key, group in groupby(np.sort(data))}
    
    0 讨论(0)
  • 2021-01-01 18:43

    If you don't specify what bins to use, np.histogram and pyplot.hist will use a default setting, which is to use 10 equal bins. The left border of the 1st bin is the smallest value and the right border of the last bin is the largest.

    This is why the bin borders are floating point numbers. You can use the bins keyword arguments to enforce another choice of bins, e.g.:

    plt.hist(data, bins=np.arange(data.min(), data.max()+1))
    

    Edit: the easiest way to shift all bins to the left is probably just to subtract 0.5 from all bin borders:

    plt.hist(data, bins=np.arange(data.min(), data.max()+1)-0.5)
    

    Another way to achieve the same effect (not equivalent if non-integers are present):

    plt.hist(data, bins=np.arange(data.min(), data.max()+1), align='left')
    
    0 讨论(0)
提交回复
热议问题