问题
Hi i am starting to use r and am stuck on analyzing my data. I have a dataframe that has 157 columns. Column 1 is the dependent variable and from column 2 to 157 they are the independent variables, but from column 2 to column 79 it is a type of independent variable (n = 78) and from 80 to 157 another type (n = 78). I want to perform (78 x 78 = 6084) multiple linear regressions leaving the first independent variable of the model fixed one at a time, from columns 2 to 79. I can fix the independent variable and do the regressions separately like this
lm(Grassland$column1 ~ Grassland$column2 + x)
lm(Grassland$column1 ~ Grassland$column3 + x)
lm(Grassland$column1 ~ Grassland$column79 + x)
My question is how can I do the 3064 regressions, writing a single code and extracting only the regressions whose p-value <0.05, eliminating the non-significant regressions?
Here is my code
library(data.table)
Regressions <-
data.table(Grassland)[,
.(Lm = lapply(.SD, function(x) summary(lm(Grassland$column1 ~ Grassland$column2 + x)))), .SDcols = 80:157]
Regressions[, lapply(Lm, function(x) coef(x)[, "Pr(>|t|)"])] [2:3] < 0.05
回答1:
We can also use reformulate to create a formula and then apply the lm
lapply(setdiff(names(mtcars), "mpg"), function(x)
lm(reformulate(x, "mpg"), data = mtcars))
回答2:
One, data.table isn't necessarily going to help you here, it works fine in an external lapply. First we generate the formulas programmatically (here I'll use most of mtcars), then we apply the formula onto the data.
paste("mpg ~", setdiff(names(mtcars), "mpg"))
# [1] "mpg ~ cyl" "mpg ~ disp" "mpg ~ hp" "mpg ~ drat" "mpg ~ wt" "mpg ~ qsec" "mpg ~ vs"
# [8] "mpg ~ am" "mpg ~ gear" "mpg ~ carb"
regressions <- lapply(paste("mpg ~", setdiff(names(mtcars), "mpg")),
function(frm) lm(as.formula(frm), data=mtcars))
regressions[1:2]
# [[1]]
# Call:
# lm(formula = as.formula(frm), data = mtcars)
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
# [[2]]
# Call:
# lm(formula = as.formula(frm), data = mtcars)
# Coefficients:
# (Intercept) disp
# 29.59985 -0.04122
来源:https://stackoverflow.com/questions/60817436/how-can-i-do-3064-regressions-using-the-lapply-function