How to normalize a histogram in python?

爱⌒轻易说出口 提交于 2020-01-02 04:31:20

问题


I'm trying to plot normed histogram, but instead of getting 1 as maximum value on y axis, I'm getting different numbers.

For array k=(1,4,3,1)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram, that doesn't look like normed.

For a different array k=(3,3,3,3)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(3,3,3,3)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram with max y-value is 10.

For different k I get different max value of y even though normed=1 or normed=True.

Why the normalization (if it works) changes based on the data and how can I make maximum value of y equals to 1?

UPDATE:

I am trying to implement Carsten König answer from plotting histograms whose bar heights sum to 1 in matplotlib and getting very weird result:

import numpy as np

def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    weights = np.ones_like(k)/len(k)
    plt.hist(k, weights=weights)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

Result:

What am I doing wrong?

Thanks


回答1:


When you plot a normalized histogram, it is not the height that should sum up to one, but the area underneath the curve should sum up to one:

In [44]:

import matplotlib.pyplot as plt
k=(3,3,3,3)
x, bins, p=plt.hist(k, density=True)  # used to be normed=True in older versions
from numpy import *
plt.xticks( arange(10) ) # 10 ticks on x axis
plt.show()  
In [45]:

print bins
[ 2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5]

Here, this example, the bin width is 0.1, the area underneath the curve sums up to one (0.1*10).

To have the sum of height to be 1, add the following before plt.show():

for item in p:
    item.set_height(item.get_height()/sum(x))




回答2:


One way is to get the probabilities on your own, and then plot with plt.bar:

In [91]: from collections import Counter
    ...: c=Counter(k)
    ...: print c
Counter({1: 2, 3: 1, 4: 1})

In [92]: plt.bar(prob.keys(), prob.values())
    ...: plt.show()

result:




回答3:


How the lines above:

weights = np.ones_like(myarray)/float(len(myarray))
plt.hist(myarray, weights=weights)

should work when I have a stacked histogram like this?-

n, bins, patches = plt.hist([from6to10, from10to14, from14to18, from18to22,  from22to6],
label= ['06:00-10:00','10:00-14:00','14:00-18:00','18:00- 22:00','22:00-06:00'],
stacked=True,edgecolor='black', alpha=0.8, linewidth=0.5, range=(np.nanmin(ref1arr),
stacked=True,edgecolor='black', alpha=0.8, linewidth=0.5, range=(np.nanmin(ref1arr), np.nanmax(ref1arr)), bins=10)



回答4:


A normed histogram is defined such that the sum of products of width and height of each column is equal to the total count. That's why you are not getting your max equal to one.

However, if you still want to force it to be 1, you could use numpy and matplotlib.pyplot.bar in the following way

sample = np.random.normal(0,10,100)
#generate bins boundaries and heights
bin_height,bin_boundary = np.histogram(sample,bins=10)
#define width of each column
width = bin_boundary[1]-bin_boundary[0]
#standardize each column by dividing with the maximum height
bin_height = bin_height/float(max(bin_height))
#plot
plt.bar(bin_boundary[:-1],bin_height,width = width)
plt.show()



回答5:


You could use the solution outlined here:

weights = np.ones_like(myarray)/float(len(myarray))
plt.hist(myarray, weights=weights)


来源:https://stackoverflow.com/questions/22241240/how-to-normalize-a-histogram-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!