How to return the fit error in Python curve_fit

问题

I'm trying to fit function to a data set of an experiment using python. I can get a really good approximation and the fit looks pretty good, but the error given for the parameters is incredibly high and I'm not sure how to fix this.

The function looks like this: Function

The data consist of the a time data set and a y data set. The variable "ve" is a linear velocity function, that's why in the code it is replaced with "a*x+b". Now the fit looks really good and theoretically the function should fit the data, but the error is crazily high. The code is the following:

import operator
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from lmfit import Model
from numpy import log, linspace, random
from matplotlib import colors as mcolors
from scipy.optimize import curve_fit

data6 = pd.read_csv('2594.csv')
x=data6.iloc[:18,0]
y=data6.iloc[:18,1]

def func(x, a, b, mo, q):
    return (4.9+a*x+b)*x+(a*x+b)*((mo/q)-x)*log(1-(q*x)/mo)-0.5*9.8*x*x

popt, pcov = curve_fit(func,x,y,bounds=((0, -100, 0, 0), (1000, 1000, 1, 1)))
plt.plot(x, func(x, *popt), 'g--', label='fit: a=%5.3f, b=%5.3f, mo=%5.3f, 
q=%5.3f' % tuple(popt))
plt.plot(x,y,':', label='Daten')
plt.grid(True)
plt.legend(loc='upper left')
plt.xlabel("t [s]")
plt.ylabel("z [m]")
plt.title('Anpassung vor Zeitpunkt T', )
plt.savefig('fit1.pdf')
plt.show()

Here is the fit for this line of code: Fit1

and the covariance Matrix:

[[ 3.66248820e+09  2.88800781e+09 -5.59803683e+06 -4.01121935e+05]
 [ 2.88800781e+09  2.27731332e+09 -4.44058731e+06 -3.17108449e+05]
 [-5.59803683e+06 -4.44058731e+06  2.43805434e+05  7.83731345e+03]
 [-4.01121935e+05 -3.17108449e+05  7.83731345e+03  2.65778118e+02]]

I also tried the following fit mode but I become errors of over 1400%:

fmodel = Model(func)
result = fmodel.fit(y, x=x, a=14, b=3.9, mo=0.8, q=0.002)

This is the fit report:

a:   926.607518 +/- 182751.047 (19722.59%) (init = 14)
b:   737.755741 +/- 143994.520 (19517.91%) (init = 3.9)
mo:  0.27745681 +/- 27.5360933 (9924.46%) (init = 0.8)
q:   0.00447098 +/- 0.60437392 (13517.72%) (init = 0.002)

And this is the resulting fit: Fit2 I would really appreciate some help. If possible a simple guide on how to minimize the error of the function!

The data looks like this:

x=[0.0333 0.0667 0.1    0.133  0.167  0.2    0.233  0.267  0.3    0.333  
   0.367  0.4    0.433  0.467  0.5    0.533  0.567  0.6   ]
y=[0.104 0.249 0.422 0.6   0.791 1.    1.23  1.47  1.74  2.02  2.33  2.64
   2.99  3.34  3.71  4.08  4.47  4.85 ]

Thank you!

回答1:

If you had printed out the full fit report from lmfit (or properly untangled to components of the covariance matrix from curve_fit) you would see that the parameters a and b are 100% correlated.

Basically, this is the fitting algorithm telling you that your data is not described well by you model and that you don't need that many parameters (or perhaps these parameters and this model) to describe your data.

Indeed, if you plot the data, there is a gentle slope up. Your function is the sum of three different terms:

   (4.9+a*x+b)*x
 + (a*x+b)*((mo/q)-x)*log(1-x/(mo/q)) 
 - 0.5*9.8*x*x

There are a couple of things to note:

mo and q only appear together, and as mo/q. They will not be independent.
a and b only appear together, and in the same form in multiple places
there are 2 purely quadratic terms in x, one of them hardwired.
the logarithmic term also has a quadratic prefactor. Importantly, the x data does not vary by more than 1 order of magnitude, so that log term won't really vary by much, giving mostly a third quadratic term. (as an aside: if you're taking logs, you should ensure that the argument is actually positive -- log(1-x*a) is asking for trouble)

To summarize: your model is too complicated for your data.

I very much doubt you need to log term at all. As it turns out, you can get a pretty good fit with simple parabolic model:

import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import ParabolicModel

x = np.array([0.0333, 0.0667, 0.1, 0.133, 0.167, 0.2, 0.233 , 0.267, 0.3 ,
          0.333, 0.367, 0.4, 0.433, 0.467, 0.5, 0.533 , 0.567 , 0.6 ])

y = np.array([0.104, 0.249 , 0.422, 0.6, 0.791, 1.0, 1.23, 1.47, 1.74,
          2.02, 2.33, 2.64, 2.99, 3.34, 3.71, 4.08, 4.47, 4.85 ])

qmodel = ParabolicModel()
result = qmodel.fit(y, x=x, a=1, b=2, c=0)
print(result.fit_report())
fitlabel = "fit: a=%5.3f, b=%5.3f, c=%5.3f" % (result.params['a'].value,
                                               result.params['b'].value,
                                               result.params['c'].value)
plt.plot(x, y, label='Daten')
plt.plot(x, result.best_fit, label=fitlabel)
plt.xlabel("t [s]")
plt.ylabel("z [m]")
plt.legend(loc='upper left')
plt.title("Anpassung vor Zeitpunkt T (Model: a*x^2+b*x+c)")
plt.show()

which will give a report of

[[Model]]
    Model(parabolic)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 9
    # data points      = 18
    # variables        = 3
    chi-square         = 0.00298906
    reduced chi-square = 1.9927e-04
    Akaike info crit   = -150.657052
    Bayesian info crit = -147.985936
[[Variables]]
    c: -0.02973853 +/- 0.01120090 (37.66%) (init = 0)
    b:  3.67707491 +/- 0.08142567 (2.21%) (init = 2)
    a:  7.51540814 +/- 0.12492370 (1.66%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
    C(b, a) = -0.972
    C(c, b) = -0.891
    C(c, a) =  0.785

and a plot of

回答2:

The data and the formula came from a physics experiment so I had to use the function. I could determine the actual value of mo and I also had to think about the meaning of the different variables. First in the formula "vo" and "ve" have to be equal, because the data "starts" after a period of acceleration and also because in the original fit "ve" was assumed to be a linear function but physically it should be a constant velocity. That's why in the following code vo+ve=ve. That left the fit function with two parameters which could be determined easily and with a small error.

def func(x, ve, q):
    return (2*ve*x) + ve*((0.65/q) - x)*log(1 - (q*x)/0.65) - 0.5*9.8*x*x

popt, pcov = curve_fit(func,x,y,bounds=((-100, 0), (1500, 1.5)))
plt.plot(x, func(x, *popt), color='orange', label='Anpassung: $v_e$=%5.3f, q=%5.3f' % 
tuple(popt))
plt.plot(x,y,':', label='Daten')

print(pcov)

plt.grid(True)
plt.legend(loc='upper left')
plt.xlabel("t [s]")
plt.ylabel("z [m]")
plt.title('Anpassung vor Zeitpunkt T', )
plt.savefig('anpass.pdf')
plt.savefig('anpass.png')
plt.show()

Which gives the covariance matrix:

[[ 0.01807039 -0.00305967]
 [-0.00305967  0.00065509]]

And the Fit:

Fitted Function

Which are realistic values for the function.

Thank you for your help and I hope this post is useful for someone someday!

来源：https://stackoverflow.com/questions/57198987/how-to-return-the-fit-error-in-python-curve-fit

标签

python

curve-fitting

data-analysis

data-fitting