statsmodels | 易学教程

calculate coefficient of determination (R2) and root mean square error (RMSE) for non linear curve fitting in python

阅读更多关于 calculate coefficient of determination (R2) and root mean square error (RMSE) for non linear curve fitting in python

问题 How to calculate coefficient of determination (R2) and root mean square error (RMSE) for non linear curve fitting in python. Following code does until curve fitting. Then how to calculate R2 and RMSE? import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit def func(x, a, b, c): return a * np.exp(-b * x) + c x = np.linspace(0,4,50) y = func(x, 2.5, 1.3, 0.5) yn = y + 0.2*np.random.normal(size=len(x)) popt, pcov = curve_fit(func, x, yn) plt.figure() plt.plot(x,

Pandas Dataframe AttributeError: 'DataFrame' object has no attribute 'design_info'

阅读更多关于 Pandas Dataframe AttributeError: 'DataFrame' object has no attribute 'design_info'

I am trying to use the predict() function of the statsmodels.formula.api OLS implementation. When I pass a new data frame to the function to get predicted values for an out-of-sample dataset result.predict(newdf) returns the following error: 'DataFrame' object has no attribute 'design_info' . What does this mean and how do I fix it? The full traceback is: p = result.predict(newdf) File "C:\Python27\lib\site-packages\statsmodels\base\model.py", line 878, in predict exog = dmatrix(self.model.data.orig_exog.design_info.builder, File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2088,

How to get the P Value in a Variable from OLSResults in Python?

阅读更多关于 How to get the P Value in a Variable from OLSResults in Python?

The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary()) shows the P values of each attribute to only 3 decimal places. I need to extract the p value for each attribute like Distance , CarrierNum etc. and print it in scientific notation. I can extract the coefficients using fit.params[0] or fit.params[1] etc. Need to get it for all their P values. Also what does all P values being 0 mean? We've to do fit.pvalues[i] to get the answer where i is the

Python pandas has no attribute ols - Error (rolling OLS)

阅读更多关于 Python pandas has no attribute ols - Error (rolling OLS)

问题 For my evaluation, I wanted to run a rolling 1000 window OLS regression estimation of the dataset found in this URL: https://drive.google.com/open?id=0B2Iv8dfU4fTUa3dPYW5tejA0bzg using the following Python script. # /usr/bin/python -tt import numpy as np import matplotlib.pyplot as plt import pandas as pd from statsmodels.formula.api import ols df = pd.read_csv('estimated.csv', names=('x','y')) model = pd.stats.ols.MovingOLS(y=df.Y, x=df[['y']], window_type='rolling', window=1000, intercept

Predicting values using an OLS model with statsmodels

阅读更多关于 Predicting values using an OLS model with statsmodels

I calculated a model using OLS (multiple linear regression). I divided my data to train and test (half each), and then I would like to predict values for the 2nd half of the labels. model = OLS(labels[:half], data[:half]) predictions = model.predict(data[half:]) The problem is that I get and error: File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/regression/linear_model.py", line 281, in predict return np.dot(exog, params) ValueError: matrices are not aligned I have the following array shapes: data.shape: (426, 215) labels.shape: (426,) If I

Fama Macbeth Regression in Python (Pandas or Statsmodels)

阅读更多关于 Fama Macbeth Regression in Python (Pandas or Statsmodels)

问题 Econometric Backgroud Fama Macbeth regression refers to a procedure to run regression for panel data (where there are N different individuals and each individual corresponds to multiple periods T, e.g. day, months,year). So in total there are N x T obs. Notice it's OK if the panel data is not balanced. The Fama Macbeth regression is to first run regression for each period cross-sectinally, i.e. pool N individuals together in a given period t. And do this for t=1,...T. So in total T

[Statsmodels]: How can I get statsmodel to return the pvalue of an OLS object?

阅读更多关于 [Statsmodels]: How can I get statsmodel to return the pvalue of an OLS object?

I'm quite new to programming and I'm jumping on python to get some familiarity with data analysis and machine learning. I am following a tutorial on backward elimination for a multiple linear regression. Here is the code right now: # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('50_Startups.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values #Taking care of missin' data #np.set_printoptions(threshold=100) from sklearn.preprocessing import Imputer imputer = Imputer(missing_values =

Creating dummy variable using pandas or statsmodel for interaction of two columns

阅读更多关于 Creating dummy variable using pandas or statsmodel for interaction of two columns

I have a data frame like this: Index ID Industry years_spend asset 6646 892 4 4 144.977037 2347 315 10 8 137.749138 7342 985 1 5 104.310217 137 18 5 5 156.593396 2840 381 11 2 229.538828 6579 883 11 1 171.380125 1776 235 4 7 217.734377 2691 361 1 2 148.865341 815 110 15 4 233.309491 2932 393 17 5 187.281724 I want to create dummy variables for Industry X years_spend which creates len(df.Industry.value_counts()) * len(df.years_spend.value_counts()) varaible, for example d_11_4 = 1 for all rows that has industry==1 and years spend=4 otherwise d_11_4 = 0. Then I can use these vars for some

Logistic Regression in statsmodels “LinAlgError: Singular matrix”

阅读更多关于 Logistic Regression in statsmodels “LinAlgError: Singular matrix”

Not sure why but I'm getting a "numpy.linalg.linalg.LinAlgError: Singular matrix" error when fitting a logistic regression model. from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split import statsmodels.api as sm data = load_breast_cancer() y = data.target X = data.data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=2) X_train = sm.add_constant(X_train) X_test = sm.add_constant(X_test) model = sm.Logit(y_train, X_train) fit = model.fit() # error appears on this line fit.summary2() 来源： https:/

How to extract the regression coefficient from statsmodels.api?

阅读更多关于 How to extract the regression coefficient from statsmodels.api?

问题 result = sm.OLS(gold_lookback, silver_lookback ).fit() After I get the result, how can I get the coefficient and the constant? In other words, if y = ax + c how to get the values a and c ? 回答1: You can use the params property of a fitted model to get the coefficients. For example, the following code: import statsmodels.api as sm import numpy as np np.random.seed(1) X = sm.add_constant(np.arange(100)) y = np.dot(X, [1,2]) + np.random.normal(size=100) result = sm.OLS(y, X).fit() print(result