问题
I have two data frames:
x = data.frame(Var1= c("A", "B", "C", "D","E"),Var2=c("F","G","H","I","J"),
Value= c(11, 12, 13, 14,18))
y = data.frame(A= c(11, 12, 13, 14,18), B= c(15, 16, 17, 14,18),C= c(17, 22, 23, 24,18), D= c(11, 12, 13, 34,18),E= c(11, 5, 13, 55,18), F= c(8, 12, 13, 14,18),G= c(7, 5, 13, 14,18),
H= c(8, 12, 13, 14,18), I= c(9, 5, 13, 14,18), J= c(11, 12, 13, 14,18))
Var3 <- rep("time", each=length(x$Var1))
x=cbind(x,Var3)
time=seq(1:length(y[,1]))
y=cbind(y,time)
> x
Var1 Var2 Value Var3
1 A F 11 time
2 B G 12 time
3 C H 13 time
4 D I 14 time
5 E J 18 time
> y
A B C D E F G H I J time
1 11 15 17 11 11 8 7 8 9 11 1
2 12 16 22 12 5 12 5 12 5 12 2
3 13 17 23 13 13 13 13 13 13 13 3
4 14 14 24 34 55 14 14 14 14 14 4
5 18 18 18 18 18 18 18 18 18 18 5
Looking at x DF, I have variable A and F as the first row. I want to select these two variables in y DF and implement a simple regression: lm(A ~ F, data = y), and save the result in the first position of a list. I will do the same with the second row of x DF implementing a regression lm(B ~ G, data = y).
How could I match variables names in x to data in y for a regression?
Revised question: how about a more complicated regression Var1 ~ Var2 + Var3?
回答1:
x = data.frame(Var1= c("A", "B", "C", "D","E"),
Var2=c("F","G","H","I","J"),
Value= c(11, 12, 13, 14,18))
y = data.frame(A= c(11, 12, 13, 14,18),
B= c(15, 16, 17, 14,18),
C= c(17, 22, 23, 24,18),
D= c(11, 12, 13, 34,18),
E= c(11, 5, 13, 55,18),
F= c(8, 12, 13, 14,18),
G= c(7, 5, 13, 14,18),
H= c(8, 12, 13, 14,18),
I= c(9, 5, 13, 14,18),
J= c(11, 12, 13, 14,18))
We can use
fitmodel <- function (RHS, LHS) do.call("lm", list(formula = reformulate(RHS, LHS),
data = quote(y)))
modList <- Map(fitmodel, as.character(x$Var2), as.character(x$Var1))
modList[[1]] ## for example
#Call:
#lm(formula = A ~ F, data = y)
#
#Coefficients:
#(Intercept) F
# 4.3500 0.7115
Remarks:
The use of
do.callis to ensure thatreformulateis evaluated when passed tolm. This is desired as it allows functions likeupdateto work correctly on the model object. See Showing string in formula and not as variable in lm fit. For a comparison:oo <- Map(function (RHS, LHS) lm(reformulate(RHS, LHS), data = y), as.character(x$Var2), as.character(x$Var1)) oo[[1]] #Call: #lm(formula = reformulate(RHS, LHS), data = y) # #Coefficients: #(Intercept) F # 4.3500 0.7115The
as.characteronx$Var1andx$Var2is necessary, as these two variables are currently "factor" variables not strings andreformulatecan't use them. If you putstringsAsFactors = FALSEindata.framewhen you build yourx, there is no such issue.
It works for you? It's not suppose to have a "for" loop?
The Map function hides that "for" loop. It is a wrapper of the mapply function. The *apply family functions in R are a syntactic sugar.
Update on your revised question
Your original question is constructs a model formula as Var1 ~ Var2.
Your new question wants Var1 ~ Var2 + Var3.
x$Var3 <- rep("time", each=length(x$Var1))
y$time <- seq(1:length(y[,1]))
## collect multiple RHS variables (using concatenation function `c`)
RHS <- Map(base::c, as.character(x$Var2), as.character(x$Var3))
#str(RHS)
#List of 5 ## oh this list has names! annoying!!
# $ F: chr [1:2] "F" "time"
# $ G: chr [1:2] "G" "time"
# $ H: chr [1:2] "H" "time"
# $ I: chr [1:2] "I" "time"
# $ J: chr [1:2] "J" "time"
LHS <- as.character(x$Var1)
modList <- Map(fitmodel, RHS, LHS) ## `fitmodel` function unchanged
modList[[1]] ## for example
#Call:
#lm(formula = A ~ F + time, data = y)
#
#Coefficients:
#(Intercept) F time
# 5.6 0.5 0.5
来源:https://stackoverflow.com/questions/51914163/how-to-match-a-data-frame-of-variable-names-and-another-with-data-for-a-regressi