statsmodels

VAR model with pandas + statsmodels in Python

十年热恋 提交于 2019-12-12 08:09:40
问题 I am an avid user of R, but recently switched to Python for a few different reasons. However, I am struggling a little to run the vector AR model in Python from statsmodels. Q#1. I get an error when I run this, and I have a suspicion it has something to do with the type of my vector. import numpy as np import statsmodels.tsa.api from statsmodels import datasets import datetime as dt import pandas as pd from pandas import Series from pandas import DataFrame import os df = pd.read_csv('myfile

Python-Statsmodels ARIMA Out-Of-Sample forecast raises ValueError: no rule for interpreting end

时光毁灭记忆、已成空白 提交于 2019-12-12 04:25:00
问题 I am trying to use the statsmodels Python library, and in particular the ARIMA model. I have some troubles performing out-of-sample forecast with it. Here is my call: predicted = my_arima_result.predict(start=startDateTime, end=endDateTime) From my understanding, I need to pass a start and end parameters; but when passing a end parameter I get the following error: ValueError: no rule for interpreting end I did some tests and the call only works for in-sample calls, passing at most the start

when installing statsmodels, I get the following error:RuntimeError: dictionary changed size during iteration

為{幸葍}努か 提交于 2019-12-12 02:49:42
问题 I have read a lot of posts about this error, and the reason I am posting this is because I get the error when trying to install statsmodels package, and not one of my programs. how do I correct the error when installing a package? $ sudo pip3 install statsmodels Downloading/unpacking statsmodels Downloading statsmodels-0.5.0.tar.gz (5.5MB): 5.5MB downloaded Running setup.py (path:/tmp/pip_build_root/statsmodels/setup.py) egg_info for package statsmodels Traceback (most recent call last): File

OLS of statsmodels does not work with inversely proportional data?

青春壹個敷衍的年華 提交于 2019-12-12 02:04:46
问题 I'm trying to perform a Ordinary Least Squares Regression with some inversely proportional data, but seems like the fitting result is wrong? import statsmodels.formula.api as sm import numpy as np import matplotlib.pyplot as plt y = np.arange(100, 0, -1) x = np.arange(0, 100) result = sm.OLS(y, x).fit() fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(20, 4), sharey=True) ax.plot(x, result.fittedvalues, 'r-') ax.plot(x, y, 'x') fig.show() 回答1: You're not adding a constant as the

Specifying a Constant in Statsmodels Linear Regression?

回眸只為那壹抹淺笑 提交于 2019-12-11 22:34:21
问题 I want to use the statsmodels.regression.linear_model.OLS package to do a prediction, but with a specified constant. Currently, I can specify the presence of a constant with an argument: (from docs: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html) class statsmodels.regression.linear_model.OLS(endog, exog=None, missing='none', hasconst=None), where hasconst is a boolean. What I want to do is specify explicitly a constant C, and then fit a linear

Why does my Pandas join shift rows of the joined data?

北战南征 提交于 2019-12-11 19:59:20
问题 In Pandas, when I join , the joined data is misaligned with respect to the original DataFrame: import os import pandas as pd import statsmodels.formula.api as sm import numpy as np import matplotlib.pyplot as plt flu_train = pd.read_csv('FluTrain.csv') # From: https://courses.edx.org/c4x/MITx/15.071x/asset/FluTrain.csv cols = ['Ystart', 'Mstart', 'Dstart', 'Yend', 'Mend', 'Dend'] flu_train = flu_train.join(pd.DataFrame(flu_train.Week.str.findall('\d+').tolist(), dtype=np.int64, columns=cols))

python OLS statsmodels T Stats of variables not entered into the model

ⅰ亾dé卋堺 提交于 2019-12-11 17:43:38
问题 Hi have created a OLS regression using Statsmodels I've written some code that loops through every variable in a dataframe and enters it into the model and then records the T Stat in a new dataframe and builds a list of potential variables. However I have 20,000 variables so it takes ages to run each time. Can anyone think of a better approach? This is my current approach TStatsOut=pd.DataFrame() for i in VarsOut: try: xstrout='+'.join([baseterms,i]) fout='ymod~'+xstrout modout = smf.ols(fout

Best model for variable selection with big data?

故事扮演 提交于 2019-12-11 16:38:08
问题 I posted a question earlier about some code but now I realize I should be more broad with the general idea. Basically, I'm trying to build a statistical model with about 1000 observations and 2000 variables. I would like to determine which variables are most influential in effecting my dependent variable with high significance. I don't plan to use the model for prediction, just for variable selection. My independent variables are binary and dependent variable is continuous. I've tried

Weekday as dummy / factor variable in a linear regression model using statsmodels

白昼怎懂夜的黑 提交于 2019-12-11 15:53:22
问题 The question: How can I add a dummy / factor variable to a model using sm.OLS() ? The details Below is a reproducible dataframe that you can pick up using ctrl + C and then run the snippet further down for a reproducible example. Input data: Date A B weekday 2013-05-04 25.03 88.51 Saturday 2013-05-05 52.98 67.99 Sunday 2013-05-06 39.93 75.19 Monday 2013-05-07 47.31 86.99 Tuesday 2013-05-08 19.61 87.94 Wednesday 2013-05-09 39.51 83.10 Thursday 2013-05-10 21.22 62.16 Friday 2013-05-11 19.04 58

linear regression model with AR errors python

二次信任 提交于 2019-12-11 10:46:52
问题 Is there a python package (statsmodels/scipy/pandas/etc...) with functionality for estimating coefficients for a linear regression model with autoregressive errors in python, such as the following SAS implementation below? http://support.sas.com/documentation/cdl/en/etsug/63348/HTML/default/viewer.htm#etsug_autoreg_sect003.htm 回答1: statsmodels http://www.statsmodels.org/dev/index.html has ARMA, ARIMA and SARIMAX models that take explanatory variables to model the mean. This corresponds to a