问题
I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible.
edit [clarification]: Considering a situation like the one described here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_MultivariableMethods/ I would like to compute also multiple correlation coefficients for the predictors, in addition to the regression coefficients and the other regression parameters
回答1:
You could certainly do this with statsmodels and pandas. Something like this might get you started
import pandas
import statsmodels.api as sm
from statsmodels.formula.api import ols
data = pandas.DataFrame([["A", 4, 0, 1, 27],
["B", 7, 1, 1, 29],
["C", 6, 1, 0, 23],
["D", 2, 0, 0, 20],
["etc.", 3, 0, 1, 21]],
columns=["ID", "score", "male", "age20", "BMI"])
print data.corr()
model = ols("BMI ~ score + male + age20", data=data).fit()
print model.params
print model.summary()
Have a look at the documentation:
http://statsmodels.sourceforge.net/devel/
http://pandas.pydata.org/
Edit: I'm not familiar with the terminology multiple correlation coefficient, but I believe this is just square root of the R-squared of a multiple regression model no?
print model.rsquared**.5
print model.rsquared_adj**.5
Is this what you're after?
来源:https://stackoverflow.com/questions/13452353/what-to-use-to-do-multiple-correlation