statsmodels

Adding statsmodels 'predict' results to a Pandas dataframe

大憨熊 提交于 2019-12-17 20:28:13
问题 It is common to want to append the results of predictions to the dataset used to make the predictions, but the statsmodels predict function returns (non-indexed) results of a potentially different length than the dataset on which predictions are based. For example, if the test dataset, test , contains any null entries, then mod_fit = sm.Logit.from_formula('Y ~ A B C', train).fit() press = mod_fit.predict(test) will produce an array that is shorter than the length of test , and cannot be

How to add sum to zero constraint to GLM in Python?

怎甘沉沦 提交于 2019-12-17 16:53:21
问题 I have a model set up in Python using the statsmodel glm function but now I want to add a sum to zero constraint to the model. The model is defined as follows: import statsmodels.formula.api as smf model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit() In R, to add the constraint, I would simply do something like this: model <- glm(A ~ B + C + D –1, family=poisson(), data=data, contrasts=list(C="contr.sum", D="contr.sum")) That adds the sum to zero constraint

Weighted standard deviation in NumPy

守給你的承諾、 提交于 2019-12-17 04:17:24
问题 numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround? 回答1: How about the following short "manual calculation"? def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with the same shape. """ average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt

Difference in SGD classifier results and statsmodels results for logistic with l1

无人久伴 提交于 2019-12-14 03:59:29
问题 As a check on my work, I've been comparing the output of scikit learn's SGDClassifier logistic implementation with statsmodels logistic. Once I add some l1 in combination with categorical variables, I'm getting very different results. Is this a result of different solution techniques or am I not using the correct parameter? Much bigger differences on my own dataset, but still pretty large using mtcars: df = sm.datasets.get_rdataset("mtcars", "datasets").data y, X = patsy.dmatrices('am

Comparison of results from statsmodels ARIMA with original data

穿精又带淫゛_ 提交于 2019-12-14 01:47:38
问题 I have a time series with seasonal components. I fitted the statsmodels ARIMA with model = tsa.arima_model.ARIMA(data, (8,1,0)).fit() For example. Now, I understand that ARIMA differences my data. How can I compare the results from prediction = model.predict() fig, ax = plt.subplots() data.plot() prediction.plot() as data will be the original data and prediction is differenced, and so has a mean around 0, different from the mean of data? 回答1: As the documentation shows, if the keyword typ is

module 'statsmodels.tsa.arima_model' has no arguments 'seasonal', 'xreg', 'xtransf', 'transfer' and 'include.mean'

泪湿孤枕 提交于 2019-12-13 09:49:19
问题 I'm trying to rebuild a ARIMA by python('statsmodels.tsa.arima_model') (had build in r by arima). The question is, there is no similar arguments('seasonal', 'xreg', 'xtransf', 'transfer' and 'include.mean') in python to make it work as in r, so anyone could teach me? thanks! 来源: https://stackoverflow.com/questions/59046327/module-statsmodels-tsa-arima-model-has-no-arguments-seasonal-xreg-xtran

statsmodels — weights in robust linear regression

半世苍凉 提交于 2019-12-13 07:26:39
问题 I was looking at the robust linear regression in statsmodels and I couldn't find a way to specify the "weights" of this regression. For example in least square regression assigning weights to each observation. Similar to what WLS does in statsmodels. Or is there a way to get around it? http://www.statsmodels.org/dev/rlm.html 回答1: RLM currently does not allow user specified weights. Weights are internally used to implement the reweighted least squares fitting method. If the weights have the

Python out of sample forecasting ARIMA predict()

[亡魂溺海] 提交于 2019-12-13 07:17:09
问题 Does statsmodels.api.tsa.ARIMA(mylist, (p,d,q)).fit().predict(start, end) only work for d=0?... myList is a list of 72 decimals all >0, p=2, d=1, q=1, start=72, end=12 and the majority of the forecasts are negative decimal numbers which leads me to believe statsmodels doesn't automatically undifference after performing the forecasts. 回答1: See the typ keyword of predict in the docstring. It determines whether you get predictions in terms of differences or levels. The default is 'linear'

statsmodles AR model error when calling params

两盒软妹~` 提交于 2019-12-13 06:12:08
问题 New to statsmodels, trying to use statsmodels.tsa.ar_model to fit a pandas timeseries. #pull one series from dataframe y=data.sentiment armodel=sm.tsa.ar_model.AR(y, freq='D').fit() armodel.params() gets the following error: C:\Python27\lib\site-packages\pandas\lib.pyd in pandas.lib.SeriesIndex.__set__ (pandas\lib.c:27817)() AssertionError: Index length did not match values Any ideas? 回答1: You should upgrade to current master, if you can. This was fixed here. 来源: https://stackoverflow.com

Statsmodels Python - Weighted GLM

為{幸葍}努か 提交于 2019-12-13 05:51:50
问题 I am currently working with significantly imbalanced data using the statsmodel package GLM (Or the separate logit function if need be). Thus far I have not found a way to implement instance weighting in these methods, however I heard that the current dev release of 0.7 may have this functionality. 1) Is there a way to implement sample weighting in the current stable release 2) If not has the current 0.7-dev release implemented this feature yet? While I know I can manually over/under sample