Python: two-curve gaussian fitting with non-linear least-squares

前端 未结 3 729
野性不改
野性不改 2020-12-13 07:39

My knowledge of maths is limited which is why I am probably stuck. I have a spectra to which I am trying to fit two Gaussian peaks. I can fit to the largest peak, but I cann

相关标签:
3条回答
  • 2020-12-13 07:51

    This code worked for me providing that you are only fitting a function that is a combination of two Gaussian distributions.

    I just made a residuals function that adds two Gaussian functions and then subtracts them from the real data.

    The parameters (p) that I passed to Numpy's least squares function include: the mean of the first Gaussian function (m), the difference in the mean from the first and second Gaussian functions (dm, i.e. the horizontal shift), the standard deviation of the first (sd1), and the standard deviation of the second (sd2).

    import numpy as np
    from scipy.optimize import leastsq
    import matplotlib.pyplot as plt
    
    ######################################
    # Setting up test data
    def norm(x, mean, sd):
      norm = []
      for i in range(x.size):
        norm += [1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x[i] - mean)**2/(2*sd**2))]
      return np.array(norm)
    
    mean1, mean2 = 0, -2
    std1, std2 = 0.5, 1 
    
    x = np.linspace(-20, 20, 500)
    y_real = norm(x, mean1, std1) + norm(x, mean2, std2)
    
    ######################################
    # Solving
    m, dm, sd1, sd2 = [5, 10, 1, 1]
    p = [m, dm, sd1, sd2] # Initial guesses for leastsq
    y_init = norm(x, m, sd1) + norm(x, m + dm, sd2) # For final comparison plot
    
    def res(p, y, x):
      m, dm, sd1, sd2 = p
      m1 = m
      m2 = m1 + dm
      y_fit = norm(x, m1, sd1) + norm(x, m2, sd2)
      err = y - y_fit
      return err
    
    plsq = leastsq(res, p, args = (y_real, x))
    
    y_est = norm(x, plsq[0][0], plsq[0][2]) + norm(x, plsq[0][0] + plsq[0][1], plsq[0][3])
    
    plt.plot(x, y_real, label='Real Data')
    plt.plot(x, y_init, 'r.', label='Starting Guess')
    plt.plot(x, y_est, 'g.', label='Fitted')
    plt.legend()
    plt.show()
    

    Results of the code.

    0 讨论(0)
  • 2020-12-13 07:57

    You can use Gaussian mixture models from scikit-learn:

    from sklearn import mixture
    import matplotlib.pyplot
    import matplotlib.mlab
    import numpy as np
    clf = mixture.GMM(n_components=2, covariance_type='full')
    clf.fit(yourdata)
    m1, m2 = clf.means_
    w1, w2 = clf.weights_
    c1, c2 = clf.covars_
    histdist = matplotlib.pyplot.hist(yourdata, 100, normed=True)
    plotgauss1 = lambda x: plot(x,w1*matplotlib.mlab.normpdf(x,m1,np.sqrt(c1))[0], linewidth=3)
    plotgauss2 = lambda x: plot(x,w2*matplotlib.mlab.normpdf(x,m2,np.sqrt(c2))[0], linewidth=3)
    plotgauss1(histdist[1])
    plotgauss2(histdist[1])
    

    enter image description here

    You can also use the function below to fit the number of Gaussian you want with ncomp parameter:

    from sklearn import mixture
    %pylab
    
    def fit_mixture(data, ncomp=2, doplot=False):
        clf = mixture.GMM(n_components=ncomp, covariance_type='full')
        clf.fit(data)
        ml = clf.means_
        wl = clf.weights_
        cl = clf.covars_
        ms = [m[0] for m in ml]
        cs = [numpy.sqrt(c[0][0]) for c in cl]
        ws = [w for w in wl]
        if doplot == True:
            histo = hist(data, 200, normed=True)
            for w, m, c in zip(ws, ms, cs):
                plot(histo[1],w*matplotlib.mlab.normpdf(histo[1],m,np.sqrt(c)), linewidth=3)
        return ms, cs, ws
    
    0 讨论(0)
  • 2020-12-13 08:10

    coeffs 0 and 4 are degenerate - there is absolutely nothing in the data that can decide between them. you should use a single zero level parameter instead of two (ie remove one of them from your code). this is probably what is stopping your fit (ignore the comments here saying this is not possible - there are clearly at least two peaks in that data and you should certainly be able to fit to that).

    (it may not be clear why i am suggesting this, but what is happening is that coeffs 0 and 4 can cancel each other out. they can both be zero, or one could be 100 and the other -100 - either way, the fit is just as good. this "confuses" the fitting routine, which spends its time trying to work out what they should be, when there is no single right answer, because whatever value one is, the other can just be the negative of that, and the fit will be the same).

    in fact, from the plot, it looks like there may be no need for a zero level at all. i would try dropping both of those and seeing how the fit looks.

    also, there is no need to fit coeffs 1 and 5 (or the zero point) in the least squares. instead, because the model is linear in those you could calculate their values each loop. this will make things faster, but is not critical. i just noticed you say your maths is not so good, so probably ignore this one.

    0 讨论(0)
提交回复
热议问题