statsmodels | 易学教程

How to get the prediction of test from 2D parameters of WLS regression in statsmodels

阅读更多关于 How to get the prediction of test from 2D parameters of WLS regression in statsmodels

问题 I'm incrementally up the parameters of WLS regression functions using statsmodels. I have a 10x3 dataset X that I declared like this: X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]]) This is my dataset, and I have a 10x2 endog vector that looks like this: z = [[ 3.90311860e-322 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0

Python - StatsModels, OLS Confidence interval

阅读更多关于 Python - StatsModels, OLS Confidence interval

In Statsmodels I can fit my model using import statsmodels.api as sm X = np.array([22000, 13400, 47600, 7400, 12000, 32000, 28000, 31000, 69000, 48600]) y = np.array([0.62, 0.24, 0.89, 0.11, 0.18, 0.75, 0.54, 0.61, 0.92, 0.88]) X2 = sm.add_constant(X) est = sm.OLS(y, X2) est2 = est.fit() then print a nice summary using print(est2.summary()) and the extract things like the p-values using est2.pvalues which can be found on this page http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.RegressionResults.html but in the summary there are confidence intervals and I am lost

python 3.5 in statsmodels ImportError: cannot import name '_representation'

阅读更多关于 python 3.5 in statsmodels ImportError: cannot import name '_representation'

I cannot manage to import statsmodels.api correctly when i do that I have this error: File "/home/mlv/.local/lib/python3.5/site-packages/statsmodels/tsa/statespace/tools.py", line 59, in set_mode from . import (_representation, _kalman_filter, _kalman_smoother, ImportError: cannot import name '_representation' I already try to re-install or update it, that does not change. plese i need help =) Please see the github report for more detail. It turns out that statsmodels is dependent upon several packages being installed before it so that it can key on them to compile its own modules. I don't

Converting statsmodels summary object to Pandas Dataframe

阅读更多关于 Converting statsmodels summary object to Pandas Dataframe

I am doing multiple linear regression with statsmodels.formula.api (ver 0.9.0) on Windows 10. After fitting the model and getting the summary with following lines i get summary in summary object format. X_opt = X[:, [0,1,2,3]] regressor_OLS = sm.OLS(endog= y, exog= X_opt).fit() regressor_OLS.summary() OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.951 Model: OLS Adj. R-squared: 0.948 Method: Least Squares F-statistic: 296.0 Date: Wed, 08 Aug 2018 Prob (F-statistic): 4.53e-30 Time: 00:46:48 Log-Likelihood: -525

Fitting negative binomial in python

阅读更多关于 Fitting negative binomial in python

In scipy there is no support for fitting a negative binomial distribution using data (maybe due to the fact that the negative binomial in scipy is only discrete). For a normal distribution I would just do: from scipy.stats import norm param = norm.fit(samp) Is there something similar 'ready to use' function in any other library? Not only because it is discrete, also because maximum likelihood fit to negative binomial can be quite involving, especially with an additional location parameter. That would be the reason why .fit() method is not provided for it (and other discrete distributions in

Python ARIMA exogenous variable out of sample

阅读更多关于 Python ARIMA exogenous variable out of sample

I am trying to predict a time series in python statsmodels ARIMA package with the inclusion of an exogenous variable, but cannot figure out the correct way to insert the exogenous variable in the predict step. See here for docs. import numpy as np from scipy import stats import pandas as pd import statsmodels.api as sm vals = np.random.rand(13) ts = pd.TimeSeries(vals) df = pd.DataFrame(ts, columns=["test"]) df.index = pd.Index(pd.date_range("2011/01/01", periods = len(vals), freq = 'Q')) fit1 = sm.tsa.ARIMA(df, (1,0,0)).fit() #this works fine: pred1 = fit1.predict(start=12, end = 16) print

Ignoring missing values in multiple OLS regression with statsmodels

阅读更多关于 Ignoring missing values in multiple OLS regression with statsmodels

I'm trying to run a multiple OLS regression using statsmodels and a pandas dataframe. There are missing values in different columns for different rows, and I keep getting the error message: ValueError: array must not contain infs or NaNs I saw this SO question, which is similar but doesn't exactly answer my question: statsmodel.api.Logit: valueerror array must not contain infs or nans What I would like to do is run the regression and ignore all rows where there are missing variables for the variables I am using in this regression. Right now I have: import pandas as pd import numpy as np import

How to apply OLS from statsmodels to groupby

阅读更多关于 How to apply OLS from statsmodels to groupby

问题 I am running OLS on products by month. While this works fine for a single product, my dataframe contains many products. If I create a groupby object OLS gives an error. linear_regression_df: product_desc period_num TOTALS 0 product_a 1 53 3 product_a 2 52 6 product_a 3 50 1 product_b 1 44 4 product_b 2 43 7 product_b 3 41 2 product_c 1 36 5 product_c 2 35 8 product_c 3 34 from pandas import DataFrame, Series import statsmodels.api as sm linear_regression_grouped = linear_regression_df.groupby

non Invertible of a ARIMA model

阅读更多关于 non Invertible of a ARIMA model

问题 I am trying to write a code to generate a series of arima model and compare different models.The code is as follow. p=0 q=0 d=0 pdq=[] aic=[] for p in range(6): for d in range(2): for q in range(4): arima_mod=sm.tsa.ARIMA(df,(p,d,q)).fit(transparams=True) x=arima_mod.aic x1= p,d,q print (x1,x) aic.append(x) pdq.append(x1) keys = pdq values = aic d = dict(zip(keys, values)) print (d) minaic=min(d, key=d.get) for i in range(3): p=minaic[0] d=minaic[1] q=minaic[2] print (p,d,q) Where 'df' is the

Pandas/Statsmodel OLS predicting future values

阅读更多关于 Pandas/Statsmodel OLS predicting future values

I've been trying to get a prediction for future values in a model I've created. I have tried both OLS in pandas and statsmodels. Here is what I have in statsmodels: import statsmodels.api as sm endog = pd.DataFrame(dframe['monthly_data_smoothed8']) smresults = sm.OLS(dframe['monthly_data_smoothed8'], dframe['date_delta']).fit() sm_pred = smresults.predict(endog) sm_pred The length of the array returned is equal to the number of records in my original dataframe but the values are not the same. When I do the following using pandas I get no values returned. from pandas.stats.api import ols res1 =