short formula call for many variables when building a model [duplicate]

天涯浪子 提交于 2019-12-17 05:50:20

问题


I am trying to build a regression model with lm(...). My dataset has lots of features(>50). I do not want to write my code as lm(output~feature1+feature2+feature3+...+feature70). I was wondering what is the short hand notation to write this code.


回答1:


You can use . as described in the help page for formula. The . stands for "all columns not otherwise in the formula".

lm(output ~ ., data = myData).

Alternatively, construct the formula manually with paste. This example is from the as.formula() help page:

xnam <- paste("x", 1:25, sep="")
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))

You can then insert this object into regression function: lm(fmla, data = myData).




回答2:


Could also try things like:

lm(output ~ myData[,2:71], data=myData)

Assuming output is the first column feature1:feature70 are the next 70 columns.

Or

features <- paste("feature",1:70, sep="")
lm(output ~ myData[,features], data=myData)

Is probably smarter as it doesn't matter where in amongst your data the columns are.

Might cause issues if there's row's removed for NA's though...



来源:https://stackoverflow.com/questions/5774813/short-formula-call-for-many-variables-when-building-a-model

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!