问题
I have a database where I want to do several multiple regressions. They all look like this:
fit <- lm(Variable1 ~ Age + Speed + Gender + Mass, data=Data)
The only variable changing is variable1. Now I want to loop or use something from the apply family to loop several variables at the place of variable1. These variables are columns in my datafile. Can someone help me to solve this problem? Many thanks!
what I tried so far:
When I extract one of the column names with the names() function I do get a the name of the column:
varname = as.name(names(Data[14]))
But when I fill this in (and I used the attach()
function):
fit <- lm(Varname ~ Age + Speed + Gender + Mass, data=Data)
I get the following error:
Error in model.frame.default(formula = Varname ~ Age + Speed + Gender + : object is not a matrix
I suppose that the lm() function does not recognize Varname as Variable1.
回答1:
You can use lapply
to loop over your variables.
fit <- lapply(Data[,c(...)], function(x) lm(x ~ Age + Speed + Gender + Mass, data = Data))
This gives you a list of your results.
The c(...)
should contain your variable names as strings. Alternatively, you can choose the variables by their position in Data
, like Data[,1:5]
.
回答2:
The problem in your case is that the formula in the lm
function attempts to read the literal names of columns in the data
or feed the whole vector into the regression. Therefore, to use the column name, you need to tell the formula to interpret the value of the variable varnames
and incorporate it with the other variables.
# generate some data
set.seed(123)
Data <- data.frame(x = rnorm(30), y = rnorm(30),
Age = sample(0:90, 30), Speed = rnorm(30, 60, 10),
Gender = sample(c("W", "M"), 30, rep=T), Mass = rnorm(30))
varnames <- names(Data)[1:2]
# fit regressions for multiple dependent variables
fit <- lapply(varnames,
FUN=function(x) lm(formula(paste(x, "~Age+Speed+Gender+Mass")), data=Data))
names(fit) <- varnames
fit
$x
Call:
lm(formula = formula(paste(x, "~Age+Speed+Gender+Mass")), data = Data)
Coefficients:
(Intercept) Age Speed GenderW Mass
0.135423 0.010013 -0.010413 0.023480 0.006939
$y
Call:
lm(formula = formula(paste(x, "~Age+Speed+Gender+Mass")), data = Data)
Coefficients:
(Intercept) Age Speed GenderW Mass
2.232269 -0.008035 -0.027147 -0.044456 -0.023895
来源:https://stackoverflow.com/questions/41241806/loop-multiple-multiple-linear-regressions-in-r