Gaussian fit to a histogram data in python: Trust Region v/s Levenberg Marquardt

问题

My histogram plot clearly shows two peaks. But while curve-fitting it with a double gaussian, it shows just one peak. Followed almost every answer shown in stackoverflow. But failed to get the correct result. It has previously been done by my teacher in Fortran and he got two peaks. I used leastsq of python's scipy.optimize in one trial. Should I give my data also? Here is my code.

binss = (max(x) - min(x))/0.05 #0.05 is my bin width
n, bins, patches = plt.hist(x, binss, color = 'grey') #gives the histogram

x_a = []
for item in range(len(bins)-1):
    b = (bins[item]+bins[item+1])/2
    x_a.append(b)

x_avg = np.array(x_a)
y_real = n

def gauss(x, A, mu, sigma):
    gaus = []
    for item in range(len(x)):
        gaus.append(A*e**(-(x[item]-mu)**2./(2.*sigma**2)))
    return np.array(gaus)
A1, A2, m1, m2, sd1, sd2 = [25, 30, 0.3, 0.6, -0.9, -0.9]

#Initial guesses for leastsq
p = [A1, A2, m1, m2, sd1, sd2]
y_init = gauss(x_avg, A1, m1, sd1) + gauss(x_avg, A2, m2, sd2)    #initially guessed y

def residual(p, x, y):
    A1, A2, m1, m2, sd1, sd2 = p
    y_fit = gauss(x, A1, m1, sd1) + gauss(x, A2, m2, sd2)
    err = y - y_fit
    return err

sf = leastsq(residual, p, args = (x_avg , y_real))

y_fitted1 = gauss(x_avg, sf[0][0], sf[0][2], sf[0][4])
y_fitted2 = gauss(x_avg, sf[0][1], sf[0][3], sf[0][5])

y_fitted = y_fitted1 + y_fitted2

plt.plot(x_avg, y_init, 'b', label='Starting Guess')
plt.plot(x_avg, y_fitted, color = 'red', label = 'Fitted Data')
plt.plot(x_avg, y_fitted1, color= 'black', label = 'Fitted1 Data')
plt.plot(x_avg, y_fitted2, color = 'green', label = 'Fitted2 Data')

Even the figure I got is not smooth. It's got only 54 points in x_avg Please do help. Can't even post the figure here.

While plotting on MATLAB, correct results were obtained. Reason: MATLAB uses Trust Region algo instead of Levenberg-Marquardt algo which was not suitable for bound constraints.

The correct results come only when this is shown as a sum of 3 individual Gaussians, not 2.

How do I get to decide which algo to use and when?

回答1:

Your problem seems about mixtures of Gaussian also known as Gaussian mixture model. There are several implementations. sklearn is worth consideration.

import numpy as np
from sklearn import mixture
import matplotlib.pyplot as plt

comp0 = np.random.randn(1000) - 5 # samples of the 1st component
comp1 = np.random.randn(1000) + 5 # samples of the 2nd component

x = np.hstack((comp0, comp1)) # merge them

gmm = mixture.GMM(n_components=2) # gmm for two components
gmm.fit(x) # train it!

linspace = np.linspace(-10, 10, 1000)

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.hist(x, 100) # draw samples
ax2.plot(linspace, np.exp(gmm.score_samples(linspace)[0]), 'r') # draw GMM
plt.show()

The output is

回答2:

I added another gaussian term. so p took 9 parameters in total. Thus

p = [A1, A2, A3, m1, m2, m3, sd1, sd2, sd3]

Then another term y_fitted3 was defined and added to y_fitted. It then gave a correct figure of two peaks fitting perfectly except for the fact that the curve was not smooth at all! Then searching in stackoverflow led me to use spline. i.e.

from scipy.interpolate import spline

and then at the end,

x_new = np.linspace(x_avg.min(),x_avg.max(),30000)
ysmooth = spline(x_avg, y_fitted, x_new)
plt.plot(x_new, ysmooth)

Then there it was. Checking in wikipedia, it says that L-M in python also uses T-R. So trying again leastsq gave the result. But still, I am not clear about the difference shown in MATLAB. Extra inputs will be appreciated! Thank you.

来源：https://stackoverflow.com/questions/23758714/gaussian-fit-to-a-histogram-data-in-python-trust-region-v-s-levenberg-marquardt

标签

python

python-2.7

curve-fitting

gaussian