Fitting a Gaussian to a histogram with MatPlotLib and Numpy - wrong Y-scaling?

前端 未结 1 1867
梦毁少年i
梦毁少年i 2020-12-21 07:58

I have written the below code to fit a Gaussian curve to a histogram. It seems to work, although the Y scaling is different. What am I doing wrong?

import ma         


        
相关标签:
1条回答
  • 2020-12-21 08:20

    You need to normalize the histogram, since the distribution you plot is also normalized:

    import matplotlib.pyplot as plt
    import numpy as np
    import matplotlib.mlab as mlab
    
    arr = np.random.randn(100)
    
    plt.figure(1)
    plt.hist(arr, normed=True)
    plt.xlim((min(arr), max(arr)))
    
    mean = np.mean(arr)
    variance = np.var(arr)
    sigma = np.sqrt(variance)
    x = np.linspace(min(arr), max(arr), 100)
    plt.plot(x, mlab.normpdf(x, mean, sigma))
    
    plt.show()
    

    Note the normed=True in the call to plt.hist. Note also that I changed your sample data, because the histogram looks weird with too few data points.

    If you instead want to keep the original histogram and rather adjust the distribution, you have to scale the distribution such that the integral over the distribution equals the integral of the histogram, i.e. the number of items in the list multiplied by the width of the bars. This can be achieved like

    import matplotlib.pyplot as plt
    import numpy as np
    import matplotlib.mlab as mlab
    
    arr = np.random.randn(1000)
    
    plt.figure(1)
    result = plt.hist(arr)
    plt.xlim((min(arr), max(arr)))
    
    mean = np.mean(arr)
    variance = np.var(arr)
    sigma = np.sqrt(variance)
    x = np.linspace(min(arr), max(arr), 100)
    dx = result[1][1] - result[1][0]
    scale = len(arr)*dx
    plt.plot(x, mlab.normpdf(x, mean, sigma)*scale)
    
    plt.show()
    

    Note the scale factor calculated from the number of items times the width of a single bar.

    0 讨论(0)
提交回复
热议问题