Solving coefficients of data set using curve_fit from scipy.optimize

て烟熏妆下的殇ゞ 提交于 2019-12-11 05:07:50

问题


I have an array A exported from excel, containing data values as shown. 1st column x and 2nd column y are dependent variables, while 3rd column z are independent variables (the output).

from xlrd import open_workbook

Data = open_workbook("simple.xls")
sheet = Data.sheet_by_name('Sheet1')

A=[]

# Read row by row
for rownum in range(sheet.nrows):
    rowValues = sheet.row_values(rownum)
    A.append(rowValues)

A = np.array(A)

A=
[[  0.00000000e+00   1.49761692e-05   0.00000000e+00]
 [  8.85000000e+02   1.49761692e-05   6.41362500e-02]
 [  1.48500000e+03   1.49761692e-05   1.19340000e-01]
 [  2.09000000e+03   1.49761692e-05   1.58760000e-01]
 [  3.36000000e+03   1.49761692e-05   2.08080000e-01]
 [  3.87000000e+03   1.49761692e-05   2.16933750e-01]
 [  6.48000000e+03   1.49761692e-05   2.46746250e-01]
 [  8.22000000e+03   1.49761692e-05   2.54700000e-01]
 [  1.05300000e+04   1.49761692e-05   2.59470000e-01]
 [  1.58250000e+04   1.49761692e-05   2.62035000e-01]
 [  2.37600000e+04   1.49761692e-05   2.68751250e-01]
 [  8.18400000e+04   1.49761692e-05   2.92848750e-01]
 [  0.00000000e+00   8.57250668e-06   0.00000000e+00]
 [  6.75000000e+02   8.57250668e-06   4.97436412e-02]
 [  1.27500000e+03   8.57250668e-06   1.27749375e-01]
 [  1.88000000e+03   8.57250668e-06   1.88617039e-01]
 [  3.15000000e+03   8.57250668e-06   2.65089780e-01]
 [  3.66000000e+03   8.57250668e-06   2.90344849e-01]
 [  6.27000000e+03   8.57250668e-06   3.36295316e-01]
 [  8.01000000e+03   8.57250668e-06   3.42702439e-01]
 [  1.03200000e+04   8.57250668e-06   3.65205982e-01]
 [  1.56150000e+04   8.57250668e-06   3.67269626e-01]
 [  2.35500000e+04   8.57250668e-06   3.87296798e-01]
 [  8.16300000e+04   8.57250668e-06   4.43486869e-01]
 [  0.00000000e+00   4.26671486e-06   0.00000000e+00]
 [  4.65000000e+02   4.26671486e-06   2.61407250e-02]
 [  1.06500000e+03   4.26671486e-06   1.22371762e-01]
 [  1.67000000e+03   4.26671486e-06   2.19629475e-01]
 [  2.94000000e+03   4.26671486e-06   3.26680087e-01]
 [  3.45000000e+03   4.26671486e-06   3.34340662e-01]
 [  6.06000000e+03   4.26671486e-06   4.18330575e-01]
 [  7.80000000e+03   4.26671486e-06   4.50631350e-01]
 [  1.01100000e+04   4.26671486e-06   4.55053950e-01]
 [  1.54050000e+04   4.26671486e-06   4.60937587e-01]
 [  2.33400000e+04   4.26671486e-06   5.10770813e-01]
 [  8.14200000e+04   4.26671486e-06   6.12569587e-01]
 [  0.00000000e+00   2.13335743e-06   0.00000000e+00]
 [  8.55000000e+02   2.13335743e-06   1.03773150e-01]
 [  1.46000000e+03   2.13335743e-06   2.21130000e-01]
 [  2.73000000e+03   2.13335743e-06   3.45515625e-01]
 [  3.24000000e+03   2.13335743e-06   3.85634925e-01]
 [  5.85000000e+03   2.13335743e-06   4.76061300e-01]
 [  7.59000000e+03   2.13335743e-06   4.79220300e-01]
 [  1.51950000e+04   2.13335743e-06   5.24709900e-01]
 [  2.31300000e+04   2.13335743e-06   5.64829200e-01]
 [  8.12100000e+04   2.13335743e-06   6.46568325e-01]
 [  0.00000000e+00   1.42359023e-06   0.00000000e+00]
 [  6.45000000e+02   1.42359023e-06   8.03596500e-02]
 [  1.25000000e+03   1.42359023e-06   2.36700000e-01]
 [  2.52000000e+03   1.42359023e-06   4.25941650e-01]
 [  3.03000000e+03   1.42359023e-06   4.61683350e-01]
 [  5.64000000e+03   1.42359023e-06   5.99561100e-01]
 [  7.38000000e+03   1.42359023e-06   6.05952000e-01]
 [  9.69000000e+03   1.42359023e-06   6.16958550e-01]
 [  1.49850000e+04   1.42359023e-06   6.57434250e-01]
 [  2.29200000e+04   1.42359023e-06   6.45954300e-01]
 [  8.10000000e+04   1.42359023e-06   7.79689800e-01]
 [  0.00000000e+00   9.36010573e-07   0.00000000e+00]
 [  4.35000000e+02   9.36010573e-07   3.40200000e-02]
 [  1.04000000e+03   9.36010573e-07   1.91160000e-01]
 [  2.31000000e+03   9.36010573e-07   3.77640000e-01]
 [  2.82000000e+03   9.36010573e-07   4.44240000e-01]
 [  5.43000000e+03   9.36010573e-07   5.50440000e-01]
 [  7.17000000e+03   9.36010573e-07   5.36580000e-01]
 [  9.48000000e+03   9.36010573e-07   5.83740000e-01]
 [  1.47750000e+04   9.36010573e-07   5.87340000e-01]
 [  2.27100000e+04   9.36010573e-07   6.33060000e-01]
 [  8.07900000e+04   9.36010573e-07   7.36200000e-01]]

x= A[:,0]
y= A[:,1]
z= A[:,2]

I have a function that would fit into the data from array A in order to solve for coefficients a and b.

def func(data,a,b):
    return a/(data[:,1]*b)*np.log(1+(data[:,1]*b/a)*(1-np.exp(-a*data[:,0]))) 

The rest of the code shows the initial guess of the coefficients a and b, the scipy.optimize.curve_fit() function, and matplotlib.pyplot to plot the result.

guess = [3.0e-5, 128 ]  

print guess, 'initial guessed parameters' 

params, pcov = scipy.optimize.curve_fit(func, A[:,:2], A[:,2], guess)

print params, 'fitted parameters' 

import matplotlib.pyplot as plt 
plt.plot(x,func(A,params[0],params[1]),'-r',x,z,'o') 
plt.title('Plot') 
plt.legend(['Fit', 'Data'], loc='lower right')
plt.show()

The result of the plot is this

And the resultant coefficient is:

[3e-05, 128] initial guessed parameters
[  2.00773153e-04   1.22752179e+02] fitted parameters

Because all the data is inside arrayA, scipy thinks that the points in the array joins from one point to another, resulting in the end each curve to go back to the origin, which is also the start of subsequent curves.

How should I code in python , such that scipy.optimize.curve_fit knows that the data in the array consists of multiple curves, instead of it being one single conjoined data? Any advice would be greatly appreciated.


回答1:


I've edited the code (appended below) you gave a bit just so it's cut and paste reproducible into Python, in case anyone else wants to to try it.

I'm not sure I understand your question, though. It appears x and y are your independent (not dependent) variables and z your dependent variable (i.e., the thing computed from each (x,y) pair). In this case, I'd think you'd want a three-dimensional plot - currently, if I'm reading this right, you're plotting z vs x and not showing y.

Assuming you do want to do this, I agree with the comments it'd be best if you split the separate curves apart - I would think the returns-to-zero negatively impact your fit. You can use np.where(A[:,0]==0)[0] to find the indices where x==0 and use that in a loop to split apart A - though I think np.split(A,np.where(A[:,0]==0)[0]) does it for you in one line.

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

def func(data,a,b):
    return a/(data[:,1]*b)*np.log(1+(data[:,1]*b/a)*(1-np.exp(-a*data[:,0])))

A=np.array(
(0.00000000e+00, 1.49761692e-05, 0.00000000e+00,
 8.85000000e+02, 1.49761692e-05, 6.41362500e-02,
 1.48500000e+03, 1.49761692e-05, 1.19340000e-01,
 2.09000000e+03, 1.49761692e-05, 1.58760000e-01,
 3.36000000e+03, 1.49761692e-05, 2.08080000e-01,
 3.87000000e+03, 1.49761692e-05, 2.16933750e-01,
 6.48000000e+03, 1.49761692e-05, 2.46746250e-01,
 8.22000000e+03, 1.49761692e-05, 2.54700000e-01,
 1.05300000e+04, 1.49761692e-05, 2.59470000e-01,
 1.58250000e+04, 1.49761692e-05, 2.62035000e-01,
 2.37600000e+04, 1.49761692e-05, 2.68751250e-01,
 8.18400000e+04, 1.49761692e-05, 2.92848750e-01,
 0.00000000e+00, 8.57250668e-06, 0.00000000e+00,
 6.75000000e+02, 8.57250668e-06, 4.97436412e-02,
 1.27500000e+03, 8.57250668e-06, 1.27749375e-01,
 1.88000000e+03, 8.57250668e-06, 1.88617039e-01,
 3.15000000e+03, 8.57250668e-06, 2.65089780e-01,
 3.66000000e+03, 8.57250668e-06, 2.90344849e-01,
 6.27000000e+03, 8.57250668e-06, 3.36295316e-01,
 8.01000000e+03, 8.57250668e-06, 3.42702439e-01,
 1.03200000e+04, 8.57250668e-06, 3.65205982e-01,
 1.56150000e+04, 8.57250668e-06, 3.67269626e-01,
 2.35500000e+04, 8.57250668e-06, 3.87296798e-01,
 8.16300000e+04, 8.57250668e-06, 4.43486869e-01,
 0.00000000e+00, 4.26671486e-06, 0.00000000e+00,
 4.65000000e+02, 4.26671486e-06, 2.61407250e-02,
 1.06500000e+03, 4.26671486e-06, 1.22371762e-01,
 1.67000000e+03, 4.26671486e-06, 2.19629475e-01,
 2.94000000e+03, 4.26671486e-06, 3.26680087e-01,
 3.45000000e+03, 4.26671486e-06, 3.34340662e-01,
 6.06000000e+03, 4.26671486e-06, 4.18330575e-01,
 7.80000000e+03, 4.26671486e-06, 4.50631350e-01,
 1.01100000e+04, 4.26671486e-06, 4.55053950e-01,
 1.54050000e+04, 4.26671486e-06, 4.60937587e-01,
 2.33400000e+04, 4.26671486e-06, 5.10770813e-01,
 8.14200000e+04, 4.26671486e-06, 6.12569587e-01,
 0.00000000e+00, 2.13335743e-06, 0.00000000e+00,
 8.55000000e+02, 2.13335743e-06, 1.03773150e-01,
 1.46000000e+03, 2.13335743e-06, 2.21130000e-01,
 2.73000000e+03, 2.13335743e-06, 3.45515625e-01,
 3.24000000e+03, 2.13335743e-06, 3.85634925e-01,
 5.85000000e+03, 2.13335743e-06, 4.76061300e-01,
 7.59000000e+03, 2.13335743e-06, 4.79220300e-01,
 1.51950000e+04, 2.13335743e-06, 5.24709900e-01,
 2.31300000e+04, 2.13335743e-06, 5.64829200e-01,
 8.12100000e+04, 2.13335743e-06, 6.46568325e-01,
 0.00000000e+00, 1.42359023e-06, 0.00000000e+00,
 6.45000000e+02, 1.42359023e-06, 8.03596500e-02,
 1.25000000e+03, 1.42359023e-06, 2.36700000e-01,
 2.52000000e+03, 1.42359023e-06, 4.25941650e-01,
 3.03000000e+03, 1.42359023e-06, 4.61683350e-01,
 5.64000000e+03, 1.42359023e-06, 5.99561100e-01,
 7.38000000e+03, 1.42359023e-06, 6.05952000e-01,
 9.69000000e+03, 1.42359023e-06, 6.16958550e-01,
 1.49850000e+04, 1.42359023e-06, 6.57434250e-01,
 2.29200000e+04, 1.42359023e-06, 6.45954300e-01,
 8.10000000e+04, 1.42359023e-06, 7.79689800e-01,
 0.00000000e+00, 9.36010573e-07, 0.00000000e+00,
 4.35000000e+02, 9.36010573e-07, 3.40200000e-02,
 1.04000000e+03, 9.36010573e-07, 1.91160000e-01,
 2.31000000e+03, 9.36010573e-07, 3.77640000e-01,
 2.82000000e+03, 9.36010573e-07, 4.44240000e-01,
 5.43000000e+03, 9.36010573e-07, 5.50440000e-01,
 7.17000000e+03, 9.36010573e-07, 5.36580000e-01,
 9.48000000e+03, 9.36010573e-07, 5.83740000e-01,
 1.47750000e+04, 9.36010573e-07, 5.87340000e-01,
 2.27100000e+04, 9.36010573e-07, 6.33060000e-01,
 8.07900000e+04, 9.36010573e-07, 7.36200000e-01))
A = A.reshape(len(A)/3, 3)
x= A[:,0]
y= A[:,1]
z= A[:,2]

guess = [3.0e-5, 128 ]  
print guess, 'initial guessed parameters' 

params, pcov = curve_fit(func, A[:,:2], A[:,2], guess)
print params, 'fitted parameters' 

plt.plot(x,func(A,params[0],params[1]),'-r',x,z,'o') 
plt.title('Plot') 
plt.legend(['Fit', 'Data'], loc='lower right', numpoints=1)
plt.show()



回答2:


It seems that your dataset A contains all those curves back to back.

Instead, you could split your dataset every time A[:,0] == 0.00000000e+00. After splitting it into 6 datasets, you could fit to each separately.

But if I understand your problem correctly, you would also like the parameters a and b to be the same for every dataset, correct?

In order to help you achieve that, I'm going to shamelessly plug my symfit package, which wraps curve_fit to make such problems easier to solve.

In symfit, you would do the following::

from symfit import Fit, variables, parameters, log, exp

datasets = [A_1, A_2, ...] # I'm going to assume this holds the untangled datasets one through six

xs = variables('x_1, x_2, x_3, x_4, x_5, x_6')
ys = variables('y_1, y_2, y_3, y_4, y_5, y_6')
zs = variables('z_1, ...') # same for z
a, b = parameters('a, b')

model_dict = {
    z: a/(y * b) * log(1 + (y * b/a) * (1 - exp(- a * x))) 
        for x, y, z in zip(xs, ys, zs) 
}

This code will create a vector valued model which will allow you to fit to this system of equations simultaneously (With the same instance of a and b in each!). In order to fit, we can now simply do the following:

fit = Fit(model_dict, 
     x_1=datasets[0][:,0], x_2=datasets[1][:,0], ..., 
     y_1=datasets[0][:,1], y_2=datasets[1][:,1], ..., 
     z_1=datasets[0][:,2], z_2=datasets[1][:,2], ...
)

I didn't write everything out in full but I hope this gives you an idea of how to complete this. More info can be found in the docs: symfit docs.

As a final remark, note that I have used a symbolic exp and log, not numpy's.



来源:https://stackoverflow.com/questions/21264227/solving-coefficients-of-data-set-using-curve-fit-from-scipy-optimize

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!