how to run lm regression for every column in R

前端 未结 3 518
慢半拍i
慢半拍i 2020-12-22 08:31

I have data frame as:

df=data.frame(x=rnorm(100),y1=rnorm(100),y2=rnorm(100),y3=...)

I want to run a loop which regresses each column start

相关标签:
3条回答
  • 2020-12-22 09:05
    library(tidyverse)
    df <- data.frame(x=rnorm(100),y1=rnorm(100),y2=rnorm(100))
    

    head(df) you will see

           x          y1          y2
    1 -0.8955473  0.96571502 -0.16232461
    2  0.5054406 -2.74246178 -0.18120499
    3  0.1680144 -0.06316372 -0.53614623
    4  0.2956123  0.94223922  0.38358329
    5  1.1425223  0.43150919 -0.32185672
    6 -0.3457060 -1.16637706 -0.06561134 
    
    models <- df %>% 
      pivot_longer(
        cols = starts_with("y"),
        names_to = "y_name",
        values_to = "y_value"
      ) 
    

    after this, head(models), you will get

           x y_name y_value
       <dbl> <chr>    <dbl>
    1 -0.896 y1      0.966 
    2 -0.896 y2     -0.162 
    3  0.505 y1     -2.74  
    4  0.505 y2     -0.181 
    5  0.168 y1     -0.0632
    6  0.168 y2     -0.536 
    

    split(.$y_name) will split all data by different levels of y_name, and for each part of data, they will do the same function split(map(~lm(y_value ~ x, data = .))

    After this, and head(models) you will get

    $y1
    
    Call:
    lm(formula = y_value ~ x, data = .)
    
    Coefficients:
    (Intercept)            x  
        0.14924      0.08237  
    
    
    $y2
    
    Call:
    lm(formula = y_value ~ x, data = .)
    
    Coefficients:
    (Intercept)            x  
        0.11183      0.03141  
    

    If you want to tidy your results, you could do the following thing:

      tibble(
        dvsub = names(.),
        untidied = .
        ) %>%
      mutate(tidy = map(untidied, broom::tidy)) %>%
      unnest(tidy) 
    

    Then you will get View(models) like this:

      dvsub untidied     term        estimate std.error statistic p.value
      <chr> <named list> <chr>          <dbl>     <dbl>     <dbl>   <dbl>
    1 y1    <lm>         (Intercept)   0.0367    0.0939     0.391   0.697
    2 y1    <lm>         x             0.0399    0.0965     0.413   0.680
    3 y2    <lm>         (Intercept)   0.0604    0.109      0.553   0.582
    4 y2    <lm>         x            -0.0630    0.112     -0.561   0.576
    

    So the whole code is as follows:

    models <- df %>% 
      pivot_longer(
        cols = starts_with("y"),
        names_to = "y_name",
        values_to = "y_value"
      ) %>%
      split(.$y_name) %>%
      map(~lm(y_value ~ x, data = .)) %>%
      tibble(
        dvsub = names(.),
        untidied = .
        ) %>%
      mutate(tidy = map(untidied, broom::tidy)) %>%
      unnest(tidy) 
    
    0 讨论(0)
  • 2020-12-22 09:07

    Another solution with broom and tidyverse:

    library(tidyverse)
    library(broom)
    df <- data.frame(x=rnorm(100),y1=rnorm(100),y2=rnorm(100))
    
    result <- df %>% 
      gather(measure, value, -x) %>%
      nest(-measure) %>%
      mutate(fit = map(data, ~ lm(value ~ x, data = .x)),
             tidied = map(fit, tidy)) %>%
      unnest(tidied)
    
    0 讨论(0)
  • 2020-12-22 09:08

    Your code looks fine except when you call i within lm, R will read i as a string, which you can't regress things against. Using get will allow you to pull the column corresponding to i.

    df=data.frame(x=rnorm(100),y1=rnorm(100),y2=rnorm(100),y3=rnorm(100))
    
    storage <- list()
    for(i in names(df)[-1]){
      storage[[i]] <- lm(get(i) ~ x, df)
    }
    

    I create an empty list storage, which I'm going to fill up with each iteration of the loop. It's just a personal preference but I'd also advise against how you've written your current loop:

     for(i in names(df[,-1])){
        model = lm(i~x, data=df)
    }
    

    You will overwrite model, thus returning only the last iteration results. I suggest you change it to a list, or a matrix where you can iteratively store results.

    Hope that helps

    0 讨论(0)
提交回复
热议问题