How can I do 3064 regressions using the lapply function

混江龙づ霸主 提交于 2020-03-25 05:53:13

问题


Hi i am starting to use r and am stuck on analyzing my data. I have a dataframe that has 157 columns. Column 1 is the dependent variable and from column 2 to 157 they are the independent variables, but from column 2 to column 79 it is a type of independent variable (n = 78) and from 80 to 157 another type (n = 78). I want to perform (78 x 78 = 6084) multiple linear regressions leaving the first independent variable of the model fixed one at a time, from columns 2 to 79. I can fix the independent variable and do the regressions separately like this

lm(Grassland$column1 ~ Grassland$column2 +  x)
lm(Grassland$column1 ~ Grassland$column3 +  x)

lm(Grassland$column1 ~ Grassland$column79 +  x)

My question is how can I do the 3064 regressions, writing a single code and extracting only the regressions whose p-value <0.05, eliminating the non-significant regressions?

Here is my code

library(data.table)

Regressions <- 
data.table(Grassland)[, 
                      .(Lm = lapply(.SD, function(x) summary(lm(Grassland$column1 ~ Grassland$column2 + x)))), .SDcols = 80:157]

Regressions[, lapply(Lm, function(x) coef(x)[, "Pr(>|t|)"])] [2:3] < 0.05       

回答1:


We can also use reformulate to create a formula and then apply the lm

lapply(setdiff(names(mtcars), "mpg"), function(x) 
        lm(reformulate(x, "mpg"), data = mtcars))



回答2:


One, data.table isn't necessarily going to help you here, it works fine in an external lapply. First we generate the formulas programmatically (here I'll use most of mtcars), then we apply the formula onto the data.

paste("mpg ~", setdiff(names(mtcars), "mpg"))
#  [1] "mpg ~ cyl"  "mpg ~ disp" "mpg ~ hp"   "mpg ~ drat" "mpg ~ wt"   "mpg ~ qsec" "mpg ~ vs"  
#  [8] "mpg ~ am"   "mpg ~ gear" "mpg ~ carb"

regressions <- lapply(paste("mpg ~", setdiff(names(mtcars), "mpg")),
                      function(frm) lm(as.formula(frm), data=mtcars))

regressions[1:2]
# [[1]]
# Call:
# lm(formula = as.formula(frm), data = mtcars)
# Coefficients:
# (Intercept)          cyl  
#      37.885       -2.876  
# [[2]]
# Call:
# lm(formula = as.formula(frm), data = mtcars)
# Coefficients:
# (Intercept)         disp  
#    29.59985     -0.04122  


来源:https://stackoverflow.com/questions/60817436/how-can-i-do-3064-regressions-using-the-lapply-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!