Migrating a logistic regression from R to rpy2

孤人 提交于 2019-12-05 20:15:45

I don't known how you can get the p-values, but for any others it should be something like this:

In [24]:
#what is stored in mylogit?
mylogit.names
Out[24]:
<StrVector - Python:0x10a01a0e0 / R:0x10353ab20>

['coef..., 'resi..., 'fitt..., ..., 'meth..., 'cont..., 'xlev...]
In [25]:
#looks like the first item is the coefficients
mylogit.names[0]
Out[25]:
'coefficients'
In [26]:
#OK, let's get the the coefficients.
mylogit[0]
Out[26]:
<FloatVector - Python:0x10a01a5f0 / R:0x1028bcc80>
[-3.449548, 0.002294, 0.777014, -0.560031]
In [27]:
#be careful that the index from print is R index, starting with 1. I don't see p values here
print mylogit.names
 [1] "coefficients"      "residuals"         "fitted.values"    
 [4] "effects"           "R"                 "rank"             
 [7] "qr"                "family"            "linear.predictors"
[10] "deviance"          "aic"               "null.deviance"    
[13] "iter"              "weights"           "prior.weights"    
[16] "df.residual"       "df.null"           "y"                
[19] "converged"         "boundary"          "model"            
[22] "call"              "formula"           "terms"            
[25] "data"              "offset"            "control"          
[28] "method"            "contrasts"         "xlevels"   

Edit

The P values for each terms:

In [55]:
#p values:
list(summary(mylogit)[-6])[-4:]
Out[55]:
[0.0023265825120094407,
 0.03564051883525258,
 0.017659683902155117,
 1.0581094283250368e-05]

And:

In [56]:
#coefficients 
list(summary(mylogit)[-6])[:4]
Out[56]:
[-3.449548397668471,
 0.0022939595044433334,
 0.7770135737198545,
 -0.5600313868499897]
In [57]:
#S.E.
list(summary(mylogit)[-6])[4:8]
Out[57]:
[1.1328460085495897,
 0.001091839095422917,
 0.327483878497867,
 0.12713698917130048]
In [58]:
#Z value
list(summary(mylogit)[-6])[8:12]
Out[58]:
[-3.0450285137032984,
 2.1010050968680347,
 2.3726773277632214,
 -4.4049445444662885]

Or more generally:

In [60]:

import numpy as np
In [62]:

COEF=np.array(summary(mylogit)[-6]) #it has a shape of (number_of_terms, 4)
In [63]:

COEF[:, -1] #p-value
Out[63]:
array([  2.32658251e-03,   3.56405188e-02,   1.76596839e-02,
         1.05810943e-05])
In [66]:

COEF[:, 0] #coefficients
Out[66]:
array([ -3.44954840e+00,   2.29395950e-03,   7.77013574e-01,
        -5.60031387e-01])
In [68]:

COEF[:, 1] #S.E.
Out[68]:
array([  1.13284601e+00,   1.09183910e-03,   3.27483878e-01,
         1.27136989e-01])
In [69]:

COEF[:, 2] #Z
Out[69]:
array([-3.04502851,  2.1010051 ,  2.37267733, -4.40494454])

You can also summary(mylogit).rx2('coefficient') (or rx), if you know that coefficient is in the summary vector.

This isn't quite an answer to what you asked, but if your question is more generally "how to move a logistic regression to Python", why not use statsmodels?

import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
model = smf.glm('admit ~ gre + gpa + rank', df, family=sm.families.Binomial()).fit()
print model.summary()

This prints:

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  admit   No. Observations:                  400
Model:                            GLM   Df Residuals:                      396
Model Family:                Binomial   Df Model:                            3
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -229.72
Date:                Sat, 29 Mar 2014   Deviance:                       459.44
Time:                        11:56:19   Pearson chi2:                     399.
No. Iterations:                     5                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -3.4495      1.133     -3.045      0.002        -5.670    -1.229
gre            0.0023      0.001      2.101      0.036         0.000     0.004
gpa            0.7770      0.327      2.373      0.018         0.135     1.419
rank          -0.5600      0.127     -4.405      0.000        -0.809    -0.311
==============================================================================

While there are still some statistical procedures that only have a good implementation in R, for straightforward things like linear models, it's probably a lot easier to use statsmodels than to fight with RPy2, since all of the introspection, built-in documentation, tab completion (in IPython), etc. will work directly on statsmodels objects.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!