statsmodels | 易学教程

Python Statsmodels Mixedlm (Mixed Linear Model) random effects

阅读更多关于 Python Statsmodels Mixedlm (Mixed Linear Model) random effects

问题 I am a bit confused about the output of Statsmodels Mixedlm and am hoping someone could explain. I have a large dataset of single family homes, including the previous two sale prices/sale dates for each property. I have geocoded this entire dataset and fetched the elevation for each property. I am trying to understand the way in which the relationship between elevation and property price appreciation varies between different cities. I have used statsmodels mixed linear model to regress price

Confidence intervals for model prediction

阅读更多关于 Confidence intervals for model prediction

I am following along with a statsmodels tutorial An OLS model is fitted with formula = 'S ~ C(E) + C(M) + X' lm = ols(formula, salary_table).fit() print lm.summary() Predicted values are provided through: lm.predict({'X' : [12], 'M' : [1], 'E' : [2]}) The result is returned as a single value array. Is there a method to also return confidence intervals for the predicted value (prediction intervals) in statsmodels? Thanks. We've been meaning to make this easier to get to. You should be able to use from statsmodels.sandbox.regression.predstd import wls_prediction_std prstd, iv_l, iv_u = wls

Why do R and statsmodels give slightly different ANOVA results?

阅读更多关于 Why do R and statsmodels give slightly different ANOVA results?

问题 Using a small R sample dataset and the ANOVA example from statsmodels, the degrees of freedom for one of the variables are reported differently, & the F-values results are also slightly different. Perhaps they have slightly different default approaches? Can I set up statsmodels to use R's defaults? import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols ##R code on R sample dataset #> anova(with(ChickWeight, lm(weight ~ Time + Diet))) #Analysis of Variance

non Invertible of a ARIMA model

阅读更多关于 non Invertible of a ARIMA model

I am trying to write a code to generate a series of arima model and compare different models.The code is as follow. p=0 q=0 d=0 pdq=[] aic=[] for p in range(6): for d in range(2): for q in range(4): arima_mod=sm.tsa.ARIMA(df,(p,d,q)).fit(transparams=True) x=arima_mod.aic x1= p,d,q print (x1,x) aic.append(x) pdq.append(x1) keys = pdq values = aic d = dict(zip(keys, values)) print (d) minaic=min(d, key=d.get) for i in range(3): p=minaic[0] d=minaic[1] q=minaic[2] print (p,d,q) Where 'df' is the time series data.And the output is as follow, (0, 0, 0) 1712.55522759 (0, 0, 1) 1693.436483044094 (0,

Confidence intervals for model prediction

阅读更多关于 Confidence intervals for model prediction

问题 I am following along with a statsmodels tutorial An OLS model is fitted with formula = 'S ~ C(E) + C(M) + X' lm = ols(formula, salary_table).fit() print lm.summary() Predicted values are provided through: lm.predict({'X' : [12], 'M' : [1], 'E' : [2]}) The result is returned as a single value array. Is there a method to also return confidence intervals for the predicted value (prediction intervals) in statsmodels? Thanks. 回答1: We've been meaning to make this easier to get to. You should be

Appending predicted values and residuals to pandas dataframe

阅读更多关于 Appending predicted values and residuals to pandas dataframe

It's a useful and common practice to append predicted values and residuals from running a regression onto a dataframe as distinct columns. I'm new to pandas, and I'm having trouble performing this very simple operation. I know I'm missing something obvious. There was a very similar question asked about a year-and-a-half ago, but it wasn't really answered. The dataframe currently looks something like this: y x1 x2 880.37 3.17 23 716.20 4.76 26 974.79 4.17 73 322.80 8.70 72 1054.25 11.45 16 And all I'm wanting is to return a dataframe that has the predicted value and residual from y = x1 + x2

How to visualize a nonlinear relationship in a scatter plot

阅读更多关于 How to visualize a nonlinear relationship in a scatter plot

I want to visually explore the relationship between two variables. The functional form of the relationship is not visible in dense scatter plots like this: How can I add a lowess smooth to the scatter plot in Python? Or do you have any other suggestions to visually explore non-linear relationships? I tried the following but it didn't work properly (drawing on an example from Michiel de Hoon ): import numpy as np from statsmodels.nonparametric.smoothers_lowess import lowess x = np.arange(0,10,0.01) ytrue = np.exp(-x/5.0) + 2*np.sin(x/3.0) # add random errors with a normal distribution y = ytrue

Interaction effects in patsy with patsy.dmatrices giving duplicate columns for “:” as with “+” , or “*”

阅读更多关于 Interaction effects in patsy with patsy.dmatrices giving duplicate columns for “:” as with “+” , or “*”

问题 I have a dataframe with columns, both of which I intend to treat as categorical variables. the first column is country , which has values such as SGP, AUS, MYS etc. The second column is time of day, which has values in 24 hour format such as 00, 11, 14, 15 etc. event is a binary variable that has 1/0 flags. I understand that to categorize them , I need to use patsy before running the Logistic regression. This, I build using dmatrices. Usecase : Consider only interaction effects of country &

Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`

阅读更多关于 Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`

问题 I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like: est = sm.OLS(y, X).fit() It throws: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data). I converted all the dtypes of the DataFrame using df.convert_objects(convert_numeric=True) After this all dtypes of dataframe variables appear as int32 or int64. But at the end it still shows dtype: object , like this: 4516 int32

Custom priors in PyMC

阅读更多关于 Custom priors in PyMC

问题 Say I want to put a custom prior on two variables a and b in PyMC, e.g.: p(a,b)∝(a+b)^(−5/2) (for the motivation behind this choice of prior, see this answer) Can this be done in PyMC? If so how? As an example, I would like to define such prior on a and b in the model below. import pymc as pm # ... # Code that defines the prior: p(a,b)∝(a+b)^(−5/2) # ... theta = pm.Beta("prior", alpha=a, beta=b) # Binomials that share a common prior bins = dict() for i in xrange(N_cities): bins[i] = pm