statsmodels

Retrieve model estimates from statsmodels

杀马特。学长 韩版系。学妹 提交于 2019-12-06 01:16:37
问题 From a dataset like this: import pandas as pd import numpy as np import statsmodels.api as sm # A dataframe with two variables np.random.seed(123) rows = 12 rng = pd.date_range('1/1/2017', periods=rows, freq='D') df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x']) df = df.set_index(rng) ...and a linear regression model like this: x = sm.add_constant(df['x']) model = sm.OLS(df['y'], x).fit() ... you can easily retrieve some model coefficients this way: print(model

Get Durbin-Watson and Jarque-Bera statistics from OLS Summary in Python

一曲冷凌霜 提交于 2019-12-05 21:17:02
I am running the OLS summary for a column of values. Part of the OLS is the Durbin-Watson and Jarque-Bera (JB) statistics and I want to pull those values out directly since they have already been calculated rather than running the steps as extra steps like I do now with durbinwatson. Here is the code I have: import pandas as pd import statsmodels.api as sm csv = mydata.csv df = pd.read_csv(csv) var = df[variable] year = df['Year'] model = sm.OLS(var,year) results = model.fit() summary = results.summary() print summary #print dir(results) residuals = results.resid durbinwatson = statsmodels

Fitting negative binomial in python

旧时模样 提交于 2019-12-05 20:38:30
问题 In scipy there is no support for fitting a negative binomial distribution using data (maybe due to the fact that the negative binomial in scipy is only discrete). For a normal distribution I would just do: from scipy.stats import norm param = norm.fit(samp) Is there something similar 'ready to use' function in any other library? 回答1: Not only because it is discrete, also because maximum likelihood fit to negative binomial can be quite involving, especially with an additional location

Extracting coefficients from GLM in Python using statsmodel

假装没事ソ 提交于 2019-12-05 19:37:51
问题 I have a model which is defined as follows: import statsmodels.formula.api as smf model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit() The model has coefficients which look like so: Intercept 0.319813 C[T.foo] -1.058058 C[T.bar] -0.749859 D[T.foo] 0.217136 D[T.bar] 0.404791 B 0.262614 I can grab the values of the Intercept and B by doing model.params.Intercept and model.params.B but I can't get the values of each C and D . I have tried model.params.C[T.foo]

SandboxViolation error when installing statsmodels with easy_install

痞子三分冷 提交于 2019-12-05 18:53:37
I tried to install statsmodels Python library on a Fedora 19 system. I used easy_install as follows: easy_install -U statsmodels But I get the following error while installing: error: SandboxViolation: os.open('/root/.matplotlib/tmpvjSAwn', 131266, 384) {} The package setup script has attempted to modify files on your system that are not within the EasyInstall build area, and has been aborted. This package cannot be safely installed by EasyInstall, and may not support alternate installation locations even if you run its setup script by hand. Please inform the package's author and the

Mathematical background of statsmodels wls_prediction_std

非 Y 不嫁゛ 提交于 2019-12-05 13:00:12
wls_prediction_std returns standard deviation and confidence interval of my fitted model data. I would need to know the way the confidence intervals are calculated from the covariance matrix. (I already tried to figure it out by looking at the source code but wasn't able to) I was hoping some of you guys could help me out by writing out the mathematical expression behind wls_prediction_std. There should be a variation on this in any textbook, without the weights. For OLS, Greene (5th edition, which I used) has se = s^2 (1 + x (X'X)^{-1} x') where s^2 is the estimate of the residual variance, x

Fitting ARMA model to time series indexed by time in python

元气小坏坏 提交于 2019-12-05 12:50:36
I am trying to fit an ARMA model to a time series stored in a pandas dataframe. The dataframe has one column of values of type numpy.float64 named "val" and an index of pandas timestamps. The timestamps are in the "Year-Month-Day Hour:Minute:Second" format. I understand that the following code: from statsmodels.tsa.arima_model import ARMA model = ARMA(df["val"], (1,0)) gives me the error message: ValueError: Given a pandas object and the index does not contain dates because I have not formatted the timestamps correctly. How can I index my dataframe so that the ARMA method accepts it while

MNLogit in statsmodel returning nan

☆樱花仙子☆ 提交于 2019-12-05 12:30:28
I'm trying to use statsmodels' MNLogit function on the famous iris data set. I get: "Current function value: nan" when I try to fit a model. Here is the code I am using: import statsmodels.api as st iris = st.datasets.get_rdataset('iris','datasets') y = iris.data.Species x = iris.data.ix[:, 0:4] x = st.add_constant(x, prepend = False) mdl = st.MNLogit(y, x) mdl_fit = mdl.fit() print (mdl_fit.summary()) In the iris example we can perfectly predict Setosa. This causes problems with (partial) perfect separation in Logit and MNLogit. Perfect separation is good for prediction, but the parameters of

How to invert differencing in a Python statsmodels ARIMA forecast?

倖福魔咒の 提交于 2019-12-05 12:03:48
I'm trying to wrap my head around ARIMA forecasting using Python and Statsmodels. Specifically, for the ARIMA algorithm to work, the data needs to be made stationary via differencing (or similar method). The question is: How does one invert the differencing after the residual forecast has been made to get back to a forecast including the trend and seasonality that was differenced out? (I saw a similar question here but alas, no answers have been posted.) Here's what I've done so far (based on the example in the last chapter of Mastering Python Data Analysis , Magnus Vilhelm Persson; Luiz

numpy and statsmodels give different values when calculating correlations, How to interpret this?

给你一囗甜甜゛ 提交于 2019-12-05 08:36:16
I can't find a reason why calculating the correlation between two series A and B using numpy.correlate gives me different results than the ones I obtain using statsmodels.tsa.stattools.ccf Here's an example of this difference I mention: import numpy as np from matplotlib import pyplot as plt from statsmodels.tsa.stattools import ccf #Calculate correlation using numpy.correlate def corr(x,y): result = numpy.correlate(x, y, mode='full') return result[result.size/2:] #This are the data series I want to analyze A = np.array([np.absolute(x) for x in np.arange(-1,1.1,0.1)]) B = np.array([x for x in