Matplotlib: How to convert a histogram to a discrete probability mass function?

依然范特西╮ 提交于 2019-12-06 07:38:08

问题


I have a question regarding the hist() function with matplotlib.

I am writing a code to plot a histogram of data who's value varies from 0 to 1. For example:

values = [0.21, 0.51, 0.41, 0.21, 0.81, 0.99]

bins = np.arange(0, 1.1, 0.1)
a, b, c = plt.hist(values, bins=bins, normed=0)
plt.show()

The code above generates a correct histogram (I could not post an image since I do not have enough reputation). In terms of frequencies, it looks like:

[0 0 2 0 1 1 0 0 1 1]

I would like to convert this output to a discrete probability mass function, i.e. for the above example, I would like to get a following frequency values:

[ 0.  0.  0.333333333  0.  0.166666667  0.166666667  0.  0.  0.166666667  0.166666667 ] # each item in the previous array divided by 6)

I thought I simply need to change the parameter in the hist() function to 'normed=1'. However, I get the following histogram frequencies:

[ 0.  0.  3.33333333  0.  1.66666667  1.66666667  0.  0.  1.66666667  1.66666667 ]

This is not what I expect and I don't know how to get the discrete probability mass function who's sum should be 1.0. A similar question was asked in the following link (link to the question), but I do not think the question was resolved.

I appreciate for your help in advance.


回答1:


The reason is norm=True gives the probability density function. In probability theory, a probability density function or density of a continuous random variable, describes the relative likelihood for this random variable to take on a given value.

Let us consider a very simple example.

x=np.arange(0.1,1.1,0.1)
array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

# Bin size
bins = np.arange(0.05, 1.15, 0.1)
np.histogram(x,bins=bins,normed=1)[0]
[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]
np.histogram(x,bins=bins,normed=0)[0]/float(len(x))
[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]

# Change the bin size
bins = np.arange(0.05, 1.15, 0.2)
np.histogram(x,bins=bins,normed=1)[0]
[ 1.,  1.,  1.,  1.,  1.]
np.histogram(x,bins=bins,normed=0)[0]/float(len(x))
[ 0.2,  0.2,  0.2,  0.2,  0.2]

As, you can see in the above, the probability that x will lie between [0.05-0.15] or [0.15-0.25] is 1/10 whereas if you change the bin size to 0.2 then the probability that it will lie between [0.05-0.25] or [0.25-0.45] is 1/5. Now these actual probability values are dependent on the bin-size, however, the probability density is independent of the bins size. Thus, this is the only proper way to do the above, otherwise one would need to state the bin-width in each of the plot.

So in your case if you really want to plot the probability value at each bin (and not the probability density) then you can simply divide the frequency of each histogram by the number of total elements. However, I would suggest you not to do this unless you are working with discrete variables and each of your bins represent a single possible value of this variable.




回答2:


Plotting a Continuous Probability Function(PDF) from a Histogram – Solved in Python. refer this blog for detailed explanation. (http://howdoudoittheeasiestway.blogspot.com/2017/09/plotting-continuous-probability.html) Else you can use the code below.

n, bins, patches = plt.hist(A, 40, histtype='bar')
plt.show()
n = n/len(A)
n = np.append(n, 0)
mu = np.mean(n)
sigma = np.std(n)
plt.bar(bins,n, width=(bins[len(bins)-1]-bins[0])/40)
y1= (1/(sigma*np.sqrt(2*np.pi))*np.exp(-(bins - mu)**2 /(2*sigma**2)))*0.03
plt.plot(bins, y1, 'r--', linewidth=2)
plt.show()


来源:https://stackoverflow.com/questions/11750276/matplotlib-how-to-convert-a-histogram-to-a-discrete-probability-mass-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!