vectorization of “cumulative” regression

為{幸葍}努か 提交于 2019-12-12 15:34:54

问题


I have data

 dat <- data.frame(t=1:100,y=rnorm(100),x1=rnorm(100)),x2=rnorm(100))

where t gives points in time. I would like to regress y on x1 and x2 at each point in time based on the preceeding points in time.

I could create a loop

reg <- matrix(rep(NA,3*nrow(dat),ncol=3)
for(i in 11:nrow(dat)){
   reg[i,] <- coefficients(lm(y ~ x1 + x2, data=dat[1:i,]))
}

but I wonder whether anyone knows a way to vectorize this, perhaps using data.table.


回答1:


We can use a non-equi-self-join to get the table as you like:

library(data.table)
setDT(dat)
# not clear if you wanted points _strictly_ before present, 
#   but the fix is basically clear -- just add nomatch = 0L to skip the first row
dat[dat, on = .(t <= t), allow.cartesian = TRUE]
        t           y         x1          x2
   1:   1 -0.51729096  0.1765509  1.06562278
   2:   2 -0.51729096  0.1765509  1.06562278
   3:   2  0.85173679 -0.7801053  0.05249113
   4:   3 -0.51729096  0.1765509  1.06562278
   5:   3  0.85173679 -0.7801053  0.05249113
  ---                                       
5046: 100  1.03802913 -2.7042756  2.05639758
5047: 100 -1.29122593  0.9013410  0.77088748
5048: 100  0.08262791  0.4135725  0.92694074
5049: 100 -0.93397320  0.2719790 -0.26097185
5050: 100 -1.23897617  0.9008160  0.61121185
             i.y       i.x1        i.x2
   1: -0.5172910  0.1765509  1.06562278
   2:  0.8517368 -0.7801053  0.05249113
   3:  0.8517368 -0.7801053  0.05249113
   4: -0.5080630 -2.0701757 -1.01573263
   5: -0.5080630 -2.0701757 -1.01573263
  ---                                  
5046: -1.2389762  0.9008160  0.61121185
5047: -1.2389762  0.9008160  0.61121185
5048: -1.2389762  0.9008160  0.61121185
5049: -1.2389762  0.9008160  0.61121185
5050: -1.2389762  0.9008160  0.61121185

(a bit confusing, but in t <= t, the LHS t refers to the LHS dat, the RHS t refers to the RHS dat)

From here we need only group by t and run the regression:

dat[dat, on = .(t <= t), allow.cartesian = TRUE
    ][ , as.list(coef(lm(y ~ x1 + x2))), keyby = t
       # (only adding head here to limit output)
       ][ , head(.SD)]
#    t (Intercept)          x1          x2
# 1: 1  -0.5172910          NA          NA
# 2: 2  -0.2646369 -1.43105510          NA
# 3: 3   9.1879448  9.96212179 -10.7580819
# 4: 4  -0.3504059 -0.36654096   0.4523271
# 5: 5  -0.1681879 -0.06670494   0.3553107
# 6: 6   1.2108223  1.04082291  -0.6947567



回答2:


Try this solution using lapply on a regression ad hoc function:

f<-function(i,dat)
+ {
+       out <- coefficients(lm(y ~ x1 + x2, data=dat[1:i,]))
+       return(out)
+ }
> lapply(seq(1:nrow(dat)),f,dat=dat)
[[1]]
(Intercept)          x1          x2 
  0.4949079          NA          NA 

[[2]]
(Intercept)          x1          x2 
 -0.4552593   2.4497037          NA 

[[3]]
(Intercept)          x1          x2 
  0.1023961   1.6163017  -0.8490789 

[[4]]
(Intercept)          x1          x2 
 -0.9136870   2.1235787   0.9072042 

...

[[100]]
(Intercept)          x1          x2 
 0.06118874 -0.02917001  0.15879213 


来源:https://stackoverflow.com/questions/49531250/vectorization-of-cumulative-regression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!