statsmodels

calculate coefficient of determination (R2) and root mean square error (RMSE) for non linear curve fitting in python

二次信任 提交于 2019-12-05 02:38:30
问题 How to calculate coefficient of determination (R2) and root mean square error (RMSE) for non linear curve fitting in python. Following code does until curve fitting. Then how to calculate R2 and RMSE? import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit def func(x, a, b, c): return a * np.exp(-b * x) + c x = np.linspace(0,4,50) y = func(x, 2.5, 1.3, 0.5) yn = y + 0.2*np.random.normal(size=len(x)) popt, pcov = curve_fit(func, x, yn) plt.figure() plt.plot(x,

Pandas Dataframe AttributeError: 'DataFrame' object has no attribute 'design_info'

╄→尐↘猪︶ㄣ 提交于 2019-12-05 02:36:48
I am trying to use the predict() function of the statsmodels.formula.api OLS implementation. When I pass a new data frame to the function to get predicted values for an out-of-sample dataset result.predict(newdf) returns the following error: 'DataFrame' object has no attribute 'design_info' . What does this mean and how do I fix it? The full traceback is: p = result.predict(newdf) File "C:\Python27\lib\site-packages\statsmodels\base\model.py", line 878, in predict exog = dmatrix(self.model.data.orig_exog.design_info.builder, File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2088,

How to get the P Value in a Variable from OLSResults in Python?

一世执手 提交于 2019-12-05 01:38:01
The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary()) shows the P values of each attribute to only 3 decimal places. I need to extract the p value for each attribute like Distance , CarrierNum etc. and print it in scientific notation. I can extract the coefficients using fit.params[0] or fit.params[1] etc. Need to get it for all their P values. Also what does all P values being 0 mean? We've to do fit.pvalues[i] to get the answer where i is the

Python pandas has no attribute ols - Error (rolling OLS)

亡梦爱人 提交于 2019-12-05 01:26:48
问题 For my evaluation, I wanted to run a rolling 1000 window OLS regression estimation of the dataset found in this URL: https://drive.google.com/open?id=0B2Iv8dfU4fTUa3dPYW5tejA0bzg using the following Python script. # /usr/bin/python -tt import numpy as np import matplotlib.pyplot as plt import pandas as pd from statsmodels.formula.api import ols df = pd.read_csv('estimated.csv', names=('x','y')) model = pd.stats.ols.MovingOLS(y=df.Y, x=df[['y']], window_type='rolling', window=1000, intercept

Predicting values using an OLS model with statsmodels

断了今生、忘了曾经 提交于 2019-12-05 00:52:17
I calculated a model using OLS (multiple linear regression). I divided my data to train and test (half each), and then I would like to predict values for the 2nd half of the labels. model = OLS(labels[:half], data[:half]) predictions = model.predict(data[half:]) The problem is that I get and error: File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/regression/linear_model.py", line 281, in predict return np.dot(exog, params) ValueError: matrices are not aligned I have the following array shapes: data.shape: (426, 215) labels.shape: (426,) If I

Fama Macbeth Regression in Python (Pandas or Statsmodels)

梦想的初衷 提交于 2019-12-04 20:49:44
问题 Econometric Backgroud Fama Macbeth regression refers to a procedure to run regression for panel data (where there are N different individuals and each individual corresponds to multiple periods T, e.g. day, months,year). So in total there are N x T obs. Notice it's OK if the panel data is not balanced. The Fama Macbeth regression is to first run regression for each period cross-sectinally, i.e. pool N individuals together in a given period t. And do this for t=1,...T. So in total T

[Statsmodels]: How can I get statsmodel to return the pvalue of an OLS object?

怎甘沉沦 提交于 2019-12-04 20:12:53
I'm quite new to programming and I'm jumping on python to get some familiarity with data analysis and machine learning. I am following a tutorial on backward elimination for a multiple linear regression. Here is the code right now: # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('50_Startups.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values #Taking care of missin' data #np.set_printoptions(threshold=100) from sklearn.preprocessing import Imputer imputer = Imputer(missing_values =

Creating dummy variable using pandas or statsmodel for interaction of two columns

点点圈 提交于 2019-12-04 19:40:35
I have a data frame like this: Index ID Industry years_spend asset 6646 892 4 4 144.977037 2347 315 10 8 137.749138 7342 985 1 5 104.310217 137 18 5 5 156.593396 2840 381 11 2 229.538828 6579 883 11 1 171.380125 1776 235 4 7 217.734377 2691 361 1 2 148.865341 815 110 15 4 233.309491 2932 393 17 5 187.281724 I want to create dummy variables for Industry X years_spend which creates len(df.Industry.value_counts()) * len(df.years_spend.value_counts()) varaible, for example d_11_4 = 1 for all rows that has industry==1 and years spend=4 otherwise d_11_4 = 0. Then I can use these vars for some

Logistic Regression in statsmodels “LinAlgError: Singular matrix”

Deadly 提交于 2019-12-04 19:31:24
Not sure why but I'm getting a "numpy.linalg.linalg.LinAlgError: Singular matrix" error when fitting a logistic regression model. from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split import statsmodels.api as sm data = load_breast_cancer() y = data.target X = data.data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=2) X_train = sm.add_constant(X_train) X_test = sm.add_constant(X_test) model = sm.Logit(y_train, X_train) fit = model.fit() # error appears on this line fit.summary2() 来源: https:/

How to extract the regression coefficient from statsmodels.api?

故事扮演 提交于 2019-12-04 19:07:19
问题 result = sm.OLS(gold_lookback, silver_lookback ).fit() After I get the result, how can I get the coefficient and the constant? In other words, if y = ax + c how to get the values a and c ? 回答1: You can use the params property of a fitted model to get the coefficients. For example, the following code: import statsmodels.api as sm import numpy as np np.random.seed(1) X = sm.add_constant(np.arange(100)) y = np.dot(X, [1,2]) + np.random.normal(size=100) result = sm.OLS(y, X).fit() print(result