statsmodels

auto.arima() equivalent for python

给你一囗甜甜゛ 提交于 2019-11-26 12:21:45
问题 I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels . Currently R has a function forecast::auto.arima() which will tune the (p,d,q) parameters. How do I go about choosing the right order for my model? Are there any libraries available in python for this purpose? 回答1: You can implement a number of approaches: ARIMAResults include aic and bic . By their definition, (see here and here), these criteria penalize for

ValueError: numpy.dtype has the wrong size, try recompiling

孤街醉人 提交于 2019-11-26 12:21:33
I just installed pandas and statsmodels package on my python 2.7 When I tried "import pandas as pd", this error message comes out. Can anyone help? Thanks!!! numpy.dtype has the wrong size, try recompiling Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\analytics\ext\python27\lib\site-packages\statsmodels-0.5.0-py2.7-win32.egg\statsmodels\formula\__init__.py", line 4, in <module> from formulatools import handle_formula_data File "C:\analytics\ext\python27\lib\site-packages\statsmodels-0.5.0-py2.7-win32.egg\statsmodels\formula\formulatools.p y", line 1, in

Pythonic way of detecting outliers in one dimensional observation data

爱⌒轻易说出口 提交于 2019-11-26 11:45:53
问题 For the given data, I want to set the outlier values (defined by 95% confidense level or 95% quantile function or anything that is required) as nan values. Following is the my data and code that I am using right now. I would be glad if someone could explain me further. import numpy as np, matplotlib.pyplot as plt data = np.random.rand(1000)+5.0 plt.plot(data) plt.xlabel(\'observation number\') plt.ylabel(\'recorded value\') plt.show() 回答1: The problem with using percentile is that the points

scikit-learn & statsmodels - which R-squared is correct?

霸气de小男生 提交于 2019-11-26 08:29:10
问题 I\'d like to choose the best algorithm for future. I found some solutions, but I didn\'t understand which R-Squared value is correct. For this, I divided my data into two as test and training, and I printed two different R squared values ​​below. import statsmodels.api as sm from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score lineer = LinearRegression() lineer.fit(x_train,y_train) lineerPredict = lineer.predict(x_test) scoreLineer = r2_score(y_test,

confidence and prediction intervals with StatsModels

耗尽温柔 提交于 2019-11-26 07:23:48
问题 I do this linear regression with StatsModels : import numpy as np import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std n = 100 x = np.linspace(0, 10, n) e = np.random.normal(size=n) y = 1 + 0.5*x + 2*e X = sm.add_constant(x) re = sm.OLS(y, X).fit() print(re.summary()) prstd, iv_l, iv_u = wls_prediction_std(re) My questions are, iv_l and iv_u are the upper and lower confidence intervals or prediction intervals ? How I get others? I need the

ValueError: numpy.dtype has the wrong size, try recompiling

Deadly 提交于 2019-11-26 02:56:39
问题 I just installed pandas and statsmodels package on my python 2.7 When I tried \"import pandas as pd\", this error message comes out. Can anyone help? Thanks!!! numpy.dtype has the wrong size, try recompiling Traceback (most recent call last): File \"<stdin>\", line 1, in <module> File \"C:\\analytics\\ext\\python27\\lib\\site-packages\\statsmodels-0.5.0-py2.7-win32.egg\\statsmodels\\formula\\__init__.py\", line 4, in <module> from formulatools import handle_formula_data File \"C:\\analytics\