问题
I have data
dat <- data.frame(t=1:100,y=rnorm(100),x1=rnorm(100)),x2=rnorm(100))
where t
gives points in time. I would like to regress y
on x1
and x2
at each point in time based on the preceeding points in time.
I could create a loop
reg <- matrix(rep(NA,3*nrow(dat),ncol=3)
for(i in 11:nrow(dat)){
reg[i,] <- coefficients(lm(y ~ x1 + x2, data=dat[1:i,]))
}
but I wonder whether anyone knows a way to vectorize this, perhaps using data.table
.
回答1:
We can use a non-equi-self-join to get the table as you like:
library(data.table)
setDT(dat)
# not clear if you wanted points _strictly_ before present,
# but the fix is basically clear -- just add nomatch = 0L to skip the first row
dat[dat, on = .(t <= t), allow.cartesian = TRUE]
t y x1 x2
1: 1 -0.51729096 0.1765509 1.06562278
2: 2 -0.51729096 0.1765509 1.06562278
3: 2 0.85173679 -0.7801053 0.05249113
4: 3 -0.51729096 0.1765509 1.06562278
5: 3 0.85173679 -0.7801053 0.05249113
---
5046: 100 1.03802913 -2.7042756 2.05639758
5047: 100 -1.29122593 0.9013410 0.77088748
5048: 100 0.08262791 0.4135725 0.92694074
5049: 100 -0.93397320 0.2719790 -0.26097185
5050: 100 -1.23897617 0.9008160 0.61121185
i.y i.x1 i.x2
1: -0.5172910 0.1765509 1.06562278
2: 0.8517368 -0.7801053 0.05249113
3: 0.8517368 -0.7801053 0.05249113
4: -0.5080630 -2.0701757 -1.01573263
5: -0.5080630 -2.0701757 -1.01573263
---
5046: -1.2389762 0.9008160 0.61121185
5047: -1.2389762 0.9008160 0.61121185
5048: -1.2389762 0.9008160 0.61121185
5049: -1.2389762 0.9008160 0.61121185
5050: -1.2389762 0.9008160 0.61121185
(a bit confusing, but in t <= t
, the LHS t
refers to the LHS dat
, the RHS t
refers to the RHS dat
)
From here we need only group by t
and run the regression:
dat[dat, on = .(t <= t), allow.cartesian = TRUE
][ , as.list(coef(lm(y ~ x1 + x2))), keyby = t
# (only adding head here to limit output)
][ , head(.SD)]
# t (Intercept) x1 x2
# 1: 1 -0.5172910 NA NA
# 2: 2 -0.2646369 -1.43105510 NA
# 3: 3 9.1879448 9.96212179 -10.7580819
# 4: 4 -0.3504059 -0.36654096 0.4523271
# 5: 5 -0.1681879 -0.06670494 0.3553107
# 6: 6 1.2108223 1.04082291 -0.6947567
回答2:
Try this solution using lapply
on a regression ad hoc function:
f<-function(i,dat)
+ {
+ out <- coefficients(lm(y ~ x1 + x2, data=dat[1:i,]))
+ return(out)
+ }
> lapply(seq(1:nrow(dat)),f,dat=dat)
[[1]]
(Intercept) x1 x2
0.4949079 NA NA
[[2]]
(Intercept) x1 x2
-0.4552593 2.4497037 NA
[[3]]
(Intercept) x1 x2
0.1023961 1.6163017 -0.8490789
[[4]]
(Intercept) x1 x2
-0.9136870 2.1235787 0.9072042
...
[[100]]
(Intercept) x1 x2
0.06118874 -0.02917001 0.15879213
来源:https://stackoverflow.com/questions/49531250/vectorization-of-cumulative-regression