Confidence interval for exponential curve fit

问题

I'm trying to obtain a confidence interval on an exponential fit to some x,y data (available here). Here's the MWE I have to find the best exponential fit to the data:

from pylab import *
from scipy.optimize import curve_fit

# Read data.
x, y = np.loadtxt('exponential_data.dat', unpack=True)

def func(x, a, b, c):
    '''Exponential 3-param function.'''
    return a * np.exp(b * x) + c

# Find best fit.
popt, pcov = curve_fit(func, x, y)
print popt

# Plot data and best fit curve.
scatter(x, y)
x = linspace(11, 23, 100)
plot(x, func(x, *popt), c='r')
show()

which produces:

How can I obtain the 95% (or some other value) confidence interval on this fit preferably using either pure python, numpy or scipy (which are the packages I already have installed)?

回答1:

Gabriel's answer is incorrect. Here in red the 95% confidence band for his data as calculated by GraphPad Prism:

Background: the "confidence interval of a fitted curve" is typically called confidence band. For a 95% confidence band, one can be 95% confident that it contains the true curve. (This is different from prediction bands, shown above in gray. Prediction bands are about future data points. For more details, see, e.g., this page of the GraphPad Curve Fitting Guide.)

In Python, kmpfit can calculate the confidence band for non-linear least squares. Here for Gabriel's example:

from pylab import *
from kapteyn import kmpfit

x, y = np.loadtxt('_exp_fit.txt', unpack=True)

def model(p, x):
  a, b, c = p
  return a*np.exp(b*x)+c

f = kmpfit.simplefit(model, [.1, .1, .1], x, y)
print f.params

# confidence band
a, b, c = f.params
dfdp = [np.exp(b*x), a*x*np.exp(b*x), 1]
yhat, upper, lower = f.confidence_band(x, dfdp, 0.95, model)

scatter(x, y, marker='.', s=10, color='#0000ba')
ix = np.argsort(x)
for i, l in enumerate((upper, lower, yhat)):
  plot(x[ix], l[ix], c='g' if i == 2 else 'r', lw=2)
show()

The dfdp are the partial derivatives ∂f/∂p of the model f = a*e^(b*x) + c with respect to each parameter p (i.e., a, b, and c). For background, see the kmpfit Tutorial or this page of the GraphPad Curve Fitting Guide. (Unlike my sample code, the kmpfit Tutorial does not use confidence_band() from the library but its own, slightly different, implementation.)

Finally, the Python plot matches the Prism one:

回答2:

You can use the uncertainties module to do the uncertainty calculations. uncertainties keeps track of uncertainties and correlation. You can create correlated uncertainties.ufloat directly from the output of curve_fit.

To be able to do those calculation on non-builtin operations such as exp you need to use the functions from uncertainties.unumpy.

You should also avoid your from pylab import * import. This even overwrites python built-ins such as sum.

A complete example:

import numpy as np
from scipy.optimize import curve_fit
import uncertainties as unc
import matplotlib.pyplot as plt
import uncertainties.unumpy as unp


def func(x, a, b, c):
    '''Exponential 3-param function.'''
    return a * np.exp(b * x) + c

x, y = np.genfromtxt('data.txt', unpack=True)

popt, pcov = curve_fit(func, x, y)

a, b, c = unc.correlated_values(popt, pcov)

# Plot data and best fit curve.
plt.scatter(x, y, s=3, linewidth=0, alpha=0.3)

px = np.linspace(11, 23, 100)
# use unumpy.exp
py = a * unp.exp(b * px) + c

nom = unp.nominal_values(py)
std = unp.std_devs(py)

# plot the nominal value
plt.plot(px, nom, c='r')

# And the 2sigma uncertaintie lines
plt.plot(px, nom - 2 * std, c='c')
plt.plot(px, nom + 2 * std, c='c')
plt.savefig('fit.png', dpi=300)

And the result:

回答3:

Notice: the actual answer to obtaining the fitted curve's confidence interval is given by Ulrich here.

After some research (see here, here and 1.96) I came up with my own solution.

It accepts an arbitrary X% confidence interval and plots upper and lower curves.

Here's the MWE:

from pylab import *
from scipy.optimize import curve_fit
from scipy import stats


def func(x, a, b, c):
    '''Exponential 3-param function.'''
    return a * np.exp(b * x) + c


# Read data.
x, y = np.loadtxt('exponential_data.dat', unpack=True)

# Define confidence interval.
ci = 0.95
# Convert to percentile point of the normal distribution.
# See: https://en.wikipedia.org/wiki/Standard_score
pp = (1. + ci) / 2.
# Convert to number of standard deviations.
nstd = stats.norm.ppf(pp)
print nstd

# Find best fit.
popt, pcov = curve_fit(func, x, y)
# Standard deviation errors on the parameters.
perr = np.sqrt(np.diag(pcov))
# Add nstd standard deviations to parameters to obtain the upper confidence
# interval.
popt_up = popt + nstd * perr
popt_dw = popt - nstd * perr

# Plot data and best fit curve.
scatter(x, y)
x = linspace(11, 23, 100)
plot(x, func(x, *popt), c='g', lw=2.)
plot(x, func(x, *popt_up), c='r', lw=2.)
plot(x, func(x, *popt_dw), c='r', lw=2.)
text(12, 0.5, '{}% confidence interval'.format(ci * 100.))    

show()

回答4:

curve_fit() returns the covariance matrix - pcov -- which holds the estimated uncertainties (1 sigma). This assumes errors are normally distributed, which is sometimes questionable.

You might also consider using the lmfit package (pure python, built on top of scipy), which provides a wrapper around scipy.optimize fitting routines (including leastsq(), which is what curve_fit() uses) and can, among other things, calculate confidence intervals explicitly.

回答5:

I've always been a fan of simple bootstrapping to get confidence intervals. If you have n data points, then use the random package to select n points from your data WITH RESAMPLING (i.e. allow your program to get the same point multiple times if that's what it wants to do - very important). Once you've done that, plot the resampled points and get the best fit. Do this 10,000 times, getting a new fit line each time. Then your 95% confidence interval is the pair of lines that enclose 95% of the best fit lines you made.

It's a pretty easy method to program in Python, but it's a bit unclear how this would work out from a statistical point of view. Some more information on why you want to do this would probably lead to more appropriate answers for your task.

来源：https://stackoverflow.com/questions/24633664/confidence-interval-for-exponential-curve-fit

标签

python

numpy

scipy

confidence-interval