statsmodels | 易学教程

Difference in Python statsmodels OLS and R's lm

阅读更多关于 Difference in Python statsmodels OLS and R's lm

I'm not sure why I'm getting slightly different results for a simple OLS, depending on whether I go through panda's experimental rpy interface to do the regression in R or whether I use statsmodels in Python. import pandas from rpy2.robjects import r from functools import partial loadcsv = partial(pandas.DataFrame.from_csv, index_col="seqn", parse_dates=False) demoq = loadcsv("csv/DEMO.csv") rxq = loadcsv("csv/quest/RXQ_RX.csv") num_rx = {} for seqn, num in rxq.rxd295.iteritems(): try: val = int(num) except ValueError: val = 0 num_rx[seqn] = val series = pandas.Series(num_rx, name="num_rx")

statsmodels linear regression - patsy formula to include all predictors in model

阅读更多关于 statsmodels linear regression - patsy formula to include all predictors in model

问题 Say I have a dataframe (let's call it DF ) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and the . will include all of my independent variables in the model: # R code for fitting linear model result = lm(y ~ ., data=DF) I can't figure out how to do this with statsmodels using patsy formulas without explicitly adding all of my independent variables to the formula. Does patsy have an equivalent to R's . ? I

What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?

阅读更多关于 What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?

I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the model and make predictions. Using the pickle module is not an option because this will only take care of the data dependency (the code will not be packaged). So, I have been conducting experiments with Dill . To make my question more precise, the following is an example where I build a model and persist it. from sklearn import datasets from sklearn import svm from sklearn.preprocessing import Normalizer import dill digits =

Newey-West standard errors for OLS in Python?

阅读更多关于 Newey-West standard errors for OLS in Python?

I want to have a coefficient and Newey-West standard error associated with it. I am looking for Python library (ideally, but any working solutions is fine) that can do what the following R code is doing: library(sandwich) library(lmtest) a <- matrix(c(1,3,5,7,4,5,6,4,7,8,9)) b <- matrix(c(3,5,6,2,4,6,7,8,7,8,9)) temp.lm = lm(a ~ b) temp.summ <- summary(temp.lm) temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest)) print (temp.summ$coefficients) Result: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.0576208 2.5230532 0.8155281 0.4358205 b 0.5594796 0.4071834 1.3740235 0

Where can I find mad (mean absolute deviation) in scipy?

阅读更多关于 Where can I find mad (mean absolute deviation) in scipy?

It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers: http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473 However, I can not find it anywhere in current versions of scipy. Of course it is possible to just copy the old code from repository but I prefer to use scipy's version. Where can I find it, or has it been replaced or removed? The current version of statsmodels has mad in statsmodels.robust : >>> import numpy as np >>> from statsmodels import robust >>> a = np.matrix( [ ... [ 80, 76, 77, 78, 79, 81, 76, 77,

Add trend line to pandas

阅读更多关于 Add trend line to pandas

问题 I have time-series data, as followed: emplvl date 2003-01-01 10955.000000 2003-04-01 11090.333333 2003-07-01 11157.000000 2003-10-01 11335.666667 2004-01-01 11045.000000 2004-04-01 11175.666667 2004-07-01 11135.666667 2004-10-01 11480.333333 2005-01-01 11441.000000 2005-04-01 11531.000000 2005-07-01 11320.000000 2005-10-01 11516.666667 2006-01-01 11291.000000 2006-04-01 11223.000000 2006-07-01 11230.000000 2006-10-01 11293.000000 2007-01-01 11126.666667 2007-04-01 11383.666667 2007-07-01

OLS Regression: Scikit vs. Statsmodels?

阅读更多关于 OLS Regression: Scikit vs. Statsmodels?

Short version : I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients are all different by large amounts. This concerns me since the most likely problem is that I've made an error somewhere and now I don't feel confident in either output (since likely I have made one model incorrectly but don't know which one). Longer version : Because I don't know where the issue is, I don't know exactly which details to include, and including everything is probably too much. I

ImportError: cannot import name 'factorial'

阅读更多关于 ImportError: cannot import name 'factorial'

问题 I want to use a logit model and trying to import statsmodels library. My Version: Python 3.6.8 The best suggestion I got is to downgrade scipy but unclear how to and to what version should I downgrade. Please help how to resolve. https://github.com/statsmodels/statsmodels/issues/5747 import statsmodels.formula.api as smf ImportError Traceback (most recent call last) <ipython-input-52-f897a2d817de> in <module> ----> 1 import statsmodels.formula.api as smf ~/anaconda3/envs/py36/lib/python3.6

Equivalent of Stata macros in Python

阅读更多关于 Equivalent of Stata macros in Python

问题 I am trying to use Python for statistical analysis. In Stata I can define local macros and expand them as necessary: program define reg2 syntax varlist(min=1 max=1), indepvars(string) results(string) if "`results'" == "y" { reg `varlist' `indepvars' } if "`results'" == "n" { qui reg `varlist' `indepvars' } end sysuse auto, clear So instead of: reg2 mpg, indepvars("weight foreign price") results("y") I could do: local options , indepvars(weight foreign price) results(y) reg2 mpg `options' Or

Unable to install Statsmodels…python

阅读更多关于 Unable to install Statsmodels…python

I am using 32 bit cmd, 64 bit windows, python 2.7 when I type the command pip install statsmodels I get the following error for some module of scipy... Failed building wheel for Scipy Failed cleaning build dir for scipy be_good_do_good install numpy pip install numpy If you face installation issues for numpy, get the pre-built windows installers from http://www.lfd.uci.edu/~gohlke/pythonlibs/ for your python version (python version is different from windows version). numpy 32-bit: numpy-1.11.1+mkl-cp27-cp27m-win32.whl numpy 64-bit: numpy-1.11.1+mkl-cp27-cp27m-win_amd64.whl Later you require VC