statsmodels

Using statsmodels.seasonal_decompose() without DatetimeIndex but with Known Frequency

我的未来我决定 提交于 2019-12-10 14:33:58
问题 I have a time-series signal I would like to decompose in Python, so I turned to statsmodels.seasonal_decompose(). My data has frequency of 48 (half-hourly). I was getting the same error as this questioner, where the solution was to change from an Int index to a DatetimeIndex. But I don't know the actual dates/times my data is from. In this github thread, one of the statsmodels contributors says that "In 0.8, you should be able to specify freq as keyword argument to override the index." But

statsmodels installation: No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils

[亡魂溺海] 提交于 2019-12-10 13:28:25
问题 It seems to be a common problem but none of the answers seems help. The error is popping up when installing statsmodels with in Windows 10 (python 3.6.2 installed): python setup.py install Before that, numpy has been installed: python numpy install There was no error and I assume it was a success.But the installation of statsmodels still have the error with statsmodels installation. I did install MS c++ compiler (2015). Also I installed latest Anaconda (python 3.6.1) and it did not help. The

Python - find where the plot crosses the axhline on python plot

做~自己de王妃 提交于 2019-12-10 12:14:28
问题 I am doing some analysis on some simple data, and I am trying to plot auto-correlation and partial auto-correlation. Using these plots, I am trying to find the P and Q value to plot in my ARIMA model. I can see on the graphs, but I am wondering if I can explicitly find, for each graph, where the plot crosses the axhline? plt.subplot(122) plt.plot(lag_pacf) plt.axhline(y=0, linestyle = '--', color = 'grey') plt.axhline(y=-1.96/np.sqrt(len(log_moving_average_difference)),linestyle = '--',color

ECDF in python without step function?

那年仲夏 提交于 2019-12-10 10:47:06
问题 I have been using ECDF (empirical cumulative distribution function) from statsmodels.distributions to plot a CDF of some data. However, ECDF uses a step function and as a consequence I get jagged-looking plots. So my question is: Do scipy or statsmodels have a ECDF baked-in without a step function? By the way, I know I can do this: hist, bin_edges = histogram(b_oz, normed=True) plot(np.cumsum(hist)) but I don't get the right scales. Thanks! 回答1: If you just want to change the plot, then you

Get Durbin-Watson and Jarque-Bera statistics from OLS Summary in Python

醉酒当歌 提交于 2019-12-10 09:55:58
问题 I am running the OLS summary for a column of values. Part of the OLS is the Durbin-Watson and Jarque-Bera (JB) statistics and I want to pull those values out directly since they have already been calculated rather than running the steps as extra steps like I do now with durbinwatson. Here is the code I have: import pandas as pd import statsmodels.api as sm csv = mydata.csv df = pd.read_csv(csv) var = df[variable] year = df['Year'] model = sm.OLS(var,year) results = model.fit() summary =

How to invert differencing in a Python statsmodels ARIMA forecast?

不问归期 提交于 2019-12-10 07:33:00
问题 I'm trying to wrap my head around ARIMA forecasting using Python and Statsmodels. Specifically, for the ARIMA algorithm to work, the data needs to be made stationary via differencing (or similar method). The question is: How does one invert the differencing after the residual forecast has been made to get back to a forecast including the trend and seasonality that was differenced out? (I saw a similar question here but alas, no answers have been posted.) Here's what I've done so far (based on

Why am I getting “LinAlgError: Singular matrix” from grangercausalitytests?

一世执手 提交于 2019-12-10 03:59:50
问题 I am trying to run grangercausalitytests on two time series: import numpy as np import pandas as pd from statsmodels.tsa.stattools import grangercausalitytests n = 1000 ls = np.linspace(0, 2*np.pi, n) df1 = pd.DataFrame(np.sin(ls)) df2 = pd.DataFrame(2*np.sin(1+ls)) df = pd.concat([df1, df2], axis=1) df.plot() grangercausalitytests(df, maxlag=20) However, I am getting Granger Causality number of lags (no zero) 1 ssr based F test: F=272078066917221398041264652288.0000, p=0.0000 , df_denom=996,

Python ARIMA exogenous variable out of sample

邮差的信 提交于 2019-12-09 16:19:55
问题 I am trying to predict a time series in python statsmodels ARIMA package with the inclusion of an exogenous variable, but cannot figure out the correct way to insert the exogenous variable in the predict step. See here for docs. import numpy as np from scipy import stats import pandas as pd import statsmodels.api as sm vals = np.random.rand(13) ts = pd.TimeSeries(vals) df = pd.DataFrame(ts, columns=["test"]) df.index = pd.Index(pd.date_range("2011/01/01", periods = len(vals), freq = 'Q'))

Difference between the interaction : and * term for formulas in StatsModels OLS regression

谁说我不能喝 提交于 2019-12-09 13:15:50
问题 Hi I'm learning Statsmodel and can't figure out the difference between : and * (interaction terms) for formulas in StatsModels OLS regression. Could you please give me a hint to figure this out? Thank you! The documentation: http://statsmodels.sourceforge.net/devel/example_formulas.html 回答1: ":" will give a regression without the level itself. just the interaction you have mentioned. "*" will give a regression with the level itself + the interaction you have mentioned. for example a .

Converting statsmodels summary object to Pandas Dataframe

只愿长相守 提交于 2019-12-09 10:09:10
问题 I am doing multiple linear regression with statsmodels.formula.api (ver 0.9.0) on Windows 10. After fitting the model and getting the summary with following lines i get summary in summary object format. X_opt = X[:, [0,1,2,3]] regressor_OLS = sm.OLS(endog= y, exog= X_opt).fit() regressor_OLS.summary() OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.951 Model: OLS Adj. R-squared: 0.948 Method: Least Squares F