statsmodels | 易学教程

Pythonic way of detecting outliers in one dimensional observation data

阅读更多关于 Pythonic way of detecting outliers in one dimensional observation data

For the given data, I want to set the outlier values (defined by 95% confidense level or 95% quantile function or anything that is required) as nan values. Following is the my data and code that I am using right now. I would be glad if someone could explain me further. import numpy as np, matplotlib.pyplot as plt data = np.random.rand(1000)+5.0 plt.plot(data) plt.xlabel('observation number') plt.ylabel('recorded value') plt.show() Joe Kington The problem with using percentile is that the points identified as outliers is a function of your sample size. There are a huge number of ways to test

Deprecated rolling window option in OLS from Pandas to Statsmodels

阅读更多关于 Deprecated rolling window option in OLS from Pandas to Statsmodels

问题 as the title suggests, where has the rolling function option in the ols command in Pandas migrated to in statsmodels? I can't seem to find it. Pandas tells me doom is in the works: FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html model = pd.ols(y=series_1, x=mmmm, window=50) in fact, if you do something like: import

Time Series Analysis - unevenly spaced measures - pandas + statsmodels

阅读更多关于 Time Series Analysis - unevenly spaced measures - pandas + statsmodels

问题 I have two numpy arrays light_points and time_points and would like to use some time series analysis methods on those data. I then tried this : import statsmodels.api as sm import pandas as pd tdf = pd.DataFrame({'time':time_points[:]}) rdf = pd.DataFrame({'light':light_points[:]}) rdf.index = pd.DatetimeIndex(freq='w',start=0,periods=len(rdf.light)) #rdf.index = pd.DatetimeIndex(tdf['time']) This works but is not doing the correct thing. Indeed, the measurements are not evenly time-spaced

ARMA out-of-sample prediction with statsmodels

阅读更多关于 ARMA out-of-sample prediction with statsmodels

问题 I'm using statsmodels to fit a ARMA model. import statsmodels.api as sm arma = sm.tsa.ARMA(data, order =(4,4)); results = arma.fit( full_output=False, disp=0); Where data is a one-dimensional array. I know to get in-sample predictions: pred = results.predict(); Now, given a second data set data2 , how can I use the previously calibrated model to generate a series with forecasts (predictions) based in this observations? 回答1: I thought there was an issue for this. If you file one on github, I

auto.arima() equivalent for python

阅读更多关于 auto.arima() equivalent for python

I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels . Currently R has a function forecast::auto.arima() which will tune the (p,d,q) parameters. How do I go about choosing the right order for my model? Are there any libraries available in python for this purpose? behzad.nouri You can implement a number of approaches: ARIMAResults include aic and bic . By their definition, (see here and here ), these criteria penalize for the number of parameters in the model. So you may use these numbers to compare the models. Also

confidence and prediction intervals with StatsModels

阅读更多关于 confidence and prediction intervals with StatsModels

I do this linear regression with StatsModels : import numpy as np import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std n = 100 x = np.linspace(0, 10, n) e = np.random.normal(size=n) y = 1 + 0.5*x + 2*e X = sm.add_constant(x) re = sm.OLS(y, X).fit() print(re.summary()) prstd, iv_l, iv_u = wls_prediction_std(re) My questions are, iv_l and iv_u are the upper and lower confidence intervals or prediction intervals ? How I get others? I need the confidence and prediction intervals for all points, to do a plot. update see the second answer which is more

Capturing high multi-collinearity in statsmodels

阅读更多关于 Capturing high multi-collinearity in statsmodels

问题 Say I fit a model in statsmodels mod = smf.ols('dependent ~ first_category + second_category + other', data=df).fit() When I do mod.summary() I may see the following: Warnings: [1] The condition number is large, 1.59e+05. This might indicate that there are strong multicollinearity or other numerical problems. Sometimes the warning is different (e.g. based on eigenvalues of the design matrix). How can I capture high-multi-collinearity conditions in a variable? Is this warning stored somewhere

Weighted standard deviation in NumPy

阅读更多关于 Weighted standard deviation in NumPy

numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround? How about the following short "manual calculation"? def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with the same shape. """ average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt(variance)) There is a class in statsmodels that makes it easy to calculate weighted statistics: statsmodels

Run an OLS regression with Pandas Data Frame

阅读更多关于 Run an OLS regression with Pandas Data Frame

问题 I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example: import pandas as pd df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40, 50], "C": [32, 234, 23, 23, 42523]}) Ideally, I would have something like ols(A ~ B + C, data = df) but when I look at the examples from algorithm libraries like scikit-learn it appears to feed the data to the model with a list of rows instead of columns. This would

Fixed effect in Pandas or Statsmodels

阅读更多关于 Fixed effect in Pandas or Statsmodels

问题 Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels. There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is something called plm , but I can't import it or run it using pd.plm() . 回答1: As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options: If you use Python 3 you can use linearmodels as specified in the more recent answer: https:/