Automated formula construction

房东的猫 提交于 2019-11-26 21:03:24

问题


In a recent homework assignment, we were instructed to run 27 linear models, each time adding an additional variable (the goal was to plot the changes in R2 vs. changes in adjusted R2). I found it difficult to algorithmically create formulas like this. The code I ended up using looked like this following (note that the first column in the data frame is the dependent variable, all the rest are prospective independent variables.

 make.formula <- function(howfar) {
  formula <- c()
  for (i in 1:howfar) {
    if (i == 1) {
      formula <- paste(formula, names(d)[i], '~')}
    else if (i == howfar) {
      formula <- paste(formula, names(d)[i], '')
    }
    else {
      formula <- paste(formula, names(d)[i], '+')}
  }
  return(formula)
}

formulas <- lapply(seq(2, length(d)), make.formula)
formulas <- lapply(formulas, as.formula)
fits <- lapply(formulas, lm, data = d)

This works, but seems far from ideal, and my impression is that anything I'm doing with a for-loop in R is probably not being done the best way. Is there an easier way to algorithmically construct formulas for a given data frame?


回答1:


reformulate(), a nifty function for creating formulas from character vectors, might come in handy. Here's an example of what it does:

reformulate(response="Y", termlabels=c("X1", "X2", "X3"))
# Y ~ X1 + X2 + X3

And here's how you might use it in practice. (Note that I here create the formulas inside of the lm() calls. Because formula objects carry with them info about the environment they were created in, I'd be a bit hesitant to create them outside of the lm() call within which you actually want to use them.):

evars <- names(mtcars)[2:5]
ii <- lapply(1:4, seq_len)

lapply(ii, 
       function(X) {
          coef(lm(reformulate(response="mpg", termlabels=evars[X]), data=mtcars))
})
# [[1]]
# (Intercept)         cyl 
#    37.88458    -2.87579 
# 
# [[2]]
# (Intercept)         cyl        disp 
# 34.66099474 -1.58727681 -0.02058363 
# 
# [[3]]
# (Intercept)         cyl        disp          hp 
# 34.18491917 -1.22741994 -0.01883809 -0.01467933 
# 
# [[4]]
# (Intercept)         cyl        disp          hp        drat 
# 23.98524441 -0.81402201 -0.01389625 -0.02317068  2.15404553 



回答2:


Map can be used to solve that problem:

mydata<-mtcars

dep<-as.list(rep("mpg~",(dim(mydata)[2]-1))) # ldependent variables with ~ 
indep1<- as.list( names(mydata)[-1])
indeno<-as.list(1:(dim(mydata)[2]-1))

myreg<-Map(function(x,y) (lm(as.formula(paste(x,paste(unlist(indep[1:y]),collapse="+"))),data=mtcars))$coefficient,dep,indeno)
   > myreg
[[1]]
(Intercept)         cyl 
   37.88458    -2.87579 

[[2]]
(Intercept)         cyl        disp 
34.66099474 -1.58727681 -0.02058363 

[[3]]
(Intercept)         cyl        disp          hp 
34.18491917 -1.22741994 -0.01883809 -0.01467933 

[[4]]
(Intercept)         cyl        disp          hp        drat 
23.98524441 -0.81402201 -0.01389625 -0.02317068  2.15404553 

[[5]]
(Intercept)         cyl        disp          hp        drat          wt 
36.00835689 -1.10748650  0.01235733 -0.02401743  0.95220742 -3.67328708 

[[6]]
(Intercept)         cyl        disp          hp        drat          wt        qsec 
26.30735899 -0.81856023  0.01320490 -0.01792993  1.32040573 -4.19083238  0.40146117 

[[7]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs 
25.88354175 -0.85665309  0.01314097 -0.01733070  1.31265550 -4.22434351  0.44873351 -0.27816899 

[[8]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
15.57313068 -0.27859352  0.01471012 -0.02144242  0.81505862 -3.94373934  0.80975689  0.36835866  2.79374984 

[[9]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
12.83083549 -0.16881263  0.01623358 -0.02424055  0.70590083 -4.03214213  0.86828517  0.36470431  2.55092849 
       gear 
 0.50293618 

[[10]]
(Intercept)         cyl        disp          hp        drat          wt        qsec          vs          am 
12.30337416 -0.11144048  0.01333524 -0.02148212  0.78711097 -3.71530393  0.82104075  0.31776281  2.52022689 
       gear        carb 
 0.65541302 -0.19941925 


来源:https://stackoverflow.com/questions/18474336/automated-formula-construction

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!