Running several linear regressions from a single dataframe in R

一世执手 提交于 2019-11-29 17:58:55

Just to add an alternative, I would propose going down this route:

library(reshape2)
library(dplyr)
library(broom)

df <- melt(data.frame(x = 1962:2014, 
                      y1 = rnorm(53), 
                      y2 = rnorm(53), 
                      y3 = rnorm(53)), 
          id.vars = "x")

df %>% group_by(variable) %>% do(tidy(lm(value ~ x, data=.)))

Here, I just melt the data so that all relevant columns are given by groups of rows, to be able to use dplyr's grouped actions. This gives the following dataframe as output:

Source: local data frame [6 x 6]
Groups: variable [3]

  variable        term     estimate    std.error  statistic   p.value
    (fctr)       (chr)        (dbl)        (dbl)      (dbl)     (dbl)
1       y1 (Intercept) -3.646666114 18.988154862 -0.1920495 0.8484661
2       y1           x  0.001891627  0.009551103  0.1980533 0.8437907
3       y2 (Intercept) -8.939784046 16.206935047 -0.5516024 0.5836297
4       y2           x  0.004545156  0.008152140  0.5575415 0.5795966
5       y3 (Intercept) 21.699503502 16.785586452  1.2927462 0.2019249
6       y3           x -0.010879271  0.008443204 -1.2885240 0.2033785

This is a pretty convenient form to continue working with the coefficients. All that is required is to melt the dataframe so that all columns are rows in the dataset, and then to use dplyr's group_by to carry out the regression in all subsets. broom::tidy puts the regression output into a nice dataframe. See ?broom for more information.

In case you need to keep the models to do adjustments of some sort (which are implemented for lm objects), then you can also do the following:

df %>% group_by(variable) %>% do(mod = lm(value ~ x, data=.))

Source: local data frame [3 x 2]
Groups: <by row>

# A tibble: 3 x 2
  variable      mod
*   <fctr>   <list>
1       y1 <S3: lm>
2       y2 <S3: lm>
3       y3 <S3: lm>

Here, for each variable, the lm object is stored in the dataframe. So, if you want to get the model output for the first, you can just access it as you would access any normal dataframe, e.g.

tmp <- df %>% group_by(variable) %>% do(mod = lm(value ~ x, data=.))
tmp[tmp$variable == "y1",]$mod
[[1]]

Call:
lm(formula = value ~ x, data = .)

Coefficients:
(Intercept)            x  
  -1.807255     0.001019  

This is convenient if you want to apply some methods to all lm objects since you can use the fact that tmp$mod gives you a list of them, which makes it easy to pass to e.g. lapply.

JJ Hilfiger

Quite aside from the statistical justification for doing this, the programming problem is an interesting one. Here is a solution, but probably not the most elegant one. First, create a sample data set:

x = c(1962:2014)
y1 = c(rnorm(53))
y2 = c(rnorm(53))
y3 = c(rnorm(53))

mydata = data.frame(x, y1, y2, y3)
attach(mydata)  
head(mydata)
#     x         y1          y2         y3
#1 1962 -0.9884054 -1.68208217  0.5980446
#2 1963 -1.0741098  0.51309753  1.0986366
#3 1964  0.1357549 -0.23427820  0.1482258
#4 1965 -0.8846920 -0.60375400  0.7162992
#5 1966 -0.5529187  0.85573739  0.5541827
#6 1967  0.4881922 -0.09360152 -0.5379037

Next, use a for loop to do several regressions:

for(i in 2:4){
  reg = lm(x ~ mydata[,i])
  print(reg)
  }

Call:
lm(formula = x ~ mydata[, i])

Coefficients:
(Intercept)  mydata[, i]  
  1988.0088      -0.1341  


Call:
lm(formula = x ~ mydata[, i])

Coefficients:
(Intercept)  mydata[, i]  
    1987.87         2.07  


Call:
lm(formula = x ~ mydata[, i])

Coefficients:
(Intercept)  mydata[, i]  
   1987.304       -4.101  
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!