How to Loop/Repeat a Linear Regression in R

后端 未结 3 712
自闭症患者
自闭症患者 2020-12-07 21:47

I have figured out how to make a table in R with 4 variables, which I am using for multiple linear regressions. The dependent variable (Lung) for each regression is taken f

相关标签:
3条回答
  • 2020-12-07 21:56

    You want to run 22,000 linear regressions and extract the coefficients? That's simple to do from a coding standpoint.

    set.seed(1)
    
    # number of columns in the Lung and Blood data.frames. 22,000 for you?
    n <- 5 
    
    # dummy data
    obs <- 50 # observations
    Lung <- data.frame(matrix(rnorm(obs*n), ncol=n))
    Blood <- data.frame(matrix(rnorm(obs*n), ncol=n))
    Age <- sample(20:80, obs)
    Gender  <- factor(rbinom(obs, 1, .5))
    
    # run n regressions
    my_lms <- lapply(1:n, function(x) lm(Lung[,x] ~ Blood[,x] + Age + Gender))
    
    # extract just coefficients
    sapply(my_lms, coef)
    
    # if you need more info, get full summary call. now you can get whatever, like:
    summaries <- lapply(my_lms, summary)
    # ...coefficents with p values:
    lapply(summaries, function(x) x$coefficients[, c(1,4)])
    # ...or r-squared values
    sapply(summaries, function(x) c(r_sq = x$r.squared, 
                                    adj_r_sq = x$adj.r.squared))
    

    The models are stored in a list, where model 3 (with DV Lung[, 3] and IVs Blood[,3] + Age + Gender) is in my_lms[[3]] and so on. You can use apply functions on the list to perform summaries, from which you can extract the numbers you want.

    0 讨论(0)
  • 2020-12-07 22:05

    The question seems to be about how to call regression functions with formulas which are modified inside a loop.

    Here is how you can do it in (using diamonds dataset):

    attach(ggplot2::diamonds)
    strCols = names(ggplot2::diamonds)
    
    formula <- list(); model <- list()
    for (i in 1:1) {
      formula[[i]] = paste0(strCols[7], " ~ ", strCols[7+i])
      model[[i]] = glm(formula[[i]]) 
    
      #then you can plot or do anything else with the result ...
      png(filename = sprintf("diamonds_price=glm(%s).png", strCols[7+i]))
      par(mfrow = c(2, 2))      
      plot(model[[i]])
      dev.off()
      }
    
    0 讨论(0)
  • 2020-12-07 22:17

    Sensible or not, to make the loop at least somehow work you need:

    y<- c(1,5,6,2,5,10) # response 
    x1<- c(2,12,8,1,16,17) # predictor 
    x2<- c(2,14,5,1,17,17) 
    predictorlist<- list("x1","x2") 
    for (i in predictorlist){ 
      model <- lm(paste("y ~", i[[1]]), data=df) 
      print(summary(model)) 
    } 
    

    The paste function will solve the problem.

    0 讨论(0)
提交回复
热议问题