statsmodels | 易学教程

Mathematical background of statsmodels wls_prediction_std

阅读更多关于 Mathematical background of statsmodels wls_prediction_std

问题 wls_prediction_std returns standard deviation and confidence interval of my fitted model data. I would need to know the way the confidence intervals are calculated from the covariance matrix. (I already tried to figure it out by looking at the source code but wasn't able to) I was hoping some of you guys could help me out by writing out the mathematical expression behind wls_prediction_std. 回答1: There should be a variation on this in any textbook, without the weights. For OLS, Greene (5th

MNLogit in statsmodel returning nan

阅读更多关于 MNLogit in statsmodel returning nan

问题 I'm trying to use statsmodels' MNLogit function on the famous iris data set. I get: "Current function value: nan" when I try to fit a model. Here is the code I am using: import statsmodels.api as st iris = st.datasets.get_rdataset('iris','datasets') y = iris.data.Species x = iris.data.ix[:, 0:4] x = st.add_constant(x, prepend = False) mdl = st.MNLogit(y, x) mdl_fit = mdl.fit() print (mdl_fit.summary()) 回答1: In the iris example we can perfectly predict Setosa. This causes problems with

Fitting ARMA model to time series indexed by time in python

阅读更多关于 Fitting ARMA model to time series indexed by time in python

问题 I am trying to fit an ARMA model to a time series stored in a pandas dataframe. The dataframe has one column of values of type numpy.float64 named "val" and an index of pandas timestamps. The timestamps are in the "Year-Month-Day Hour:Minute:Second" format. I understand that the following code: from statsmodels.tsa.arima_model import ARMA model = ARMA(df["val"], (1,0)) gives me the error message: ValueError: Given a pandas object and the index does not contain dates because I have not

seasonal_decompose: operands could not be broadcast together with shapes on a series

阅读更多关于 seasonal_decompose: operands could not be broadcast together with shapes on a series

问题 I know there are many questions on this topic, but none of them helped me to solve this problem. I'm really stuck on this. With a simple series: 0 2016-01-31 266 2016-02-29 235 2016-03-31 347 2016-04-30 514 2016-05-31 374 2016-06-30 250 2016-07-31 441 2016-08-31 422 2016-09-30 323 2016-10-31 168 2016-11-30 496 2016-12-31 303 import statsmodels.api as sm logdf = np.log(df[0]) decompose = sm.tsa.seasonal_decompose(logdf,freq=12, model='additive') decomplot = decompose.plot() i keep getting:

statsmodels summary to latex

阅读更多关于 statsmodels summary to latex

问题 I'm a newbie to latex and I want to import a statsmodels(python-package) summary to my report in latex. I found that it's possible to transform a summary into a latex tabular with the following method: latex_as_tabular. Until now everything is working. Now I have to store the tabular, but I don't really understand how this works. I suppose I have to use the following commands: x_values=sm.add_constant(x_values) model=sm.OLS(y_values, x_values) results=model.fit() tbl=results.summary(xname=['b

Pandas Dataframe AttributeError: 'DataFrame' object has no attribute 'design_info'

阅读更多关于 Pandas Dataframe AttributeError: 'DataFrame' object has no attribute 'design_info'

问题 I am trying to use the predict() function of the statsmodels.formula.api OLS implementation. When I pass a new data frame to the function to get predicted values for an out-of-sample dataset result.predict(newdf) returns the following error: 'DataFrame' object has no attribute 'design_info' . What does this mean and how do I fix it? The full traceback is: p = result.predict(newdf) File "C:\Python27\lib\site-packages\statsmodels\base\model.py", line 878, in predict exog = dmatrix(self.model

How to get the P Value in a Variable from OLSResults in Python?

阅读更多关于 How to get the P Value in a Variable from OLSResults in Python?

问题 The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary()) shows the P values of each attribute to only 3 decimal places. I need to extract the p value for each attribute like Distance , CarrierNum etc. and print it in scientific notation. I can extract the coefficients using fit.params[0] or fit.params[1] etc. Need to get it for all their P values.

Predicting values using an OLS model with statsmodels

阅读更多关于 Predicting values using an OLS model with statsmodels

问题 I calculated a model using OLS (multiple linear regression). I divided my data to train and test (half each), and then I would like to predict values for the 2nd half of the labels. model = OLS(labels[:half], data[:half]) predictions = model.predict(data[half:]) The problem is that I get and error: File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/regression/linear_model.py", line 281, in predict return np.dot(exog, params) ValueError: matrices

Statsmodels.formula.api OLS does not show statistical values of intercept

阅读更多关于 Statsmodels.formula.api OLS does not show statistical values of intercept

I am running the following source code: import statsmodels.formula.api as sm # Add one column of ones for the intercept term X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1) regressor_OLS = sm.OLS(endog=y, exog=X).fit() print(regressor_OLS.summary()) where X is an 50x5 (before adding the intercept term) numpy array which looks like this: [[0 1 165349.20 136897.80 471784.10] [0 0 162597.70 151377.59 443898.53]...] and y is a a 50x1 numpy array with float values for the dependent variable. The first two columns are for a dummy variable with three different values. The rest of

[Statsmodels]: How can I get statsmodel to return the pvalue of an OLS object?

阅读更多关于 [Statsmodels]: How can I get statsmodel to return the pvalue of an OLS object?

问题 I'm quite new to programming and I'm jumping on python to get some familiarity with data analysis and machine learning. I am following a tutorial on backward elimination for a multiple linear regression. Here is the code right now: # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('50_Startups.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values #Taking care of missin' data #np.set