问题
I don't understand curve_fit
isn't able to estimate the covariance of the parameter, thus raising the OptimizeWarning
below. The following MCVE explains my problem:
MCVE python snippet
from scipy.optimize import curve_fit
func = lambda x, a: a * x
popt, pcov = curve_fit(f = func, xdata = [1], ydata = [1])
print(popt, pcov)
Output
\python-3.4.4\lib\site-packages\scipy\optimize\minpack.py:715:
OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
[ 1.] [[ inf]]
For a = 1
the function fits xdata
and ydata
exactly. Why isn't the error/variance 0
, or something close to 0
, but inf
instead?
There is this quote from the curve_fit SciPy Reference Guide:
If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with np.inf, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix.
So, what's the underlying problem? Why doesn't the Jacobian matrix at the solution have a full rank?
回答1:
The formula for the covariance of the parameters (Wikipedia) has the number of degrees of freedom in the denominator. The degrees of freedoms are computed as (number of data points) - (number of parameters), which is 1 - 1 = 0 in your example. And this is where SciPy checks the number of degrees of freedom before dividing by it.
With xdata = [1, 2], ydata = [1, 2]
you would get zero covariance (note that the model still fits exactly: exact fit is not the problem).
This is the same sort of issue as sample variance being undefined if the sample size N is 1 (the formula for sample variance has (N-1) in the denominator). If we only took size=1 sample out of the population, we don't estimate the variance by zero, we know nothing about the variance.
来源:https://stackoverflow.com/questions/41725377/why-isnt-curve-fit-able-to-estimate-the-covariance-of-the-parameter-if-the-pa