Rolling Regression with Data.table - Update?

孤人 提交于 2019-12-23 04:55:45

问题


I am attempting to run a rolling regression within a data.table. There are a number of questions that get at what I am trying to do, but they are generally 3+ years old and offer inelegant answers. (see: here, for example)

I am wondering if there has been any update to the data.table package that make this more intuitive/ faster?

Here is what I am trying to do. My code looks like this:

DT<-data.table(
  Date = seq(as.Date("2000/1/1"), by = "day", length.out = 1000),
  x1=rnorm(1000),
  x2=rnorm(1000),
  x3=rnorm(1000),
  y=rnorm(1000),
  country=rep(c("a","b","c","d"), each=25))

I would like to regress y on x1, x2 and x3, over a rolling 180 day window, by country, and store the coefficients by date.

Ideally the syntax would look something like this:

DT[,.(coef.x1 := coef(y~x1+x2+x3)[2] , 
coef.x2 := coef(y~x1+x2+x3)[3], 
coef(y~x1+x2+x3)[4],
by=c("country",ROLLING WINDOW)]

... but even more elegant/ avoiding the repetition if possible! :)

I have yet to get the rollapply syntax to work well for me for some reason.

Thank you!


EDIT:

Thank you @michaelchirico.

Your suggestion comes close to what I'm aiming for - and maybe its possible to modify the code to receive it but again, I am stuck.

Here is a more careful articulation of what I need. Some code:

DT<-data.table(
  Date = rep(seq(as.Date("2000/1/1"), by = "day", length.out = 10),times=3), #same dates per country

  x1=rep(rnorm(10),time=3), #x1's repeat - same per country
  x2=rep(rnorm(10), times=3),#x2's repeat - same per country
  x3=rep(rnorm(10), times=3), #x3's repeat - same per country
  y=rnorm(30), #y's do not repeat and are unique per country per day
  country=rep(c("a","b","c"), each=10))

#to calculate the coefficients by individual  country: 
a<-subset(DT,country=="a")
b<-subset(DT,country=="b")

window<-5 #declare window
coefs.a<-coef(lm(y~x1+x2+x3, data=a[1:window]))#initialize my coef variable
coefs.b<-coef(lm(y~x1+x2+x3, data=b[1:window]))#initialize my coef variable

##calculate coefficients per window

for(i in 1:(length(a$Date)-window)){
  coefs.a<-rbind(coefs.a, coef(lm(y~x1+x2+x3, data=a[(i+1):(i+window-1)])))
  coefs.b<-rbind(coefs.b, coef(lm(y~x1+x2+x3, data=b[(i+1):(i+window-1)])))
 }

The difference in this dataset versus the prior one is that the dates, and x1, x2, x3 all repeat. My y's are unique for each country.

In my actual data set I have 120 countries. I can calculate this for each country, but it is awfully slow and then I have to rejoin all of the coefficients into a single dataset for analysis of the results.

Is there a way similar to what you proposed to end up with a single data.table, with all observations?

Thanks once more!!


回答1:


It's still not clear exactly what you're after, but here's a shot which should be close (only minor adjustments need be made depending on the details):

I can't really speak to speed.

TT <- DT[ , uniqueN(Date), by = country][ , max(V1)]
window <- 5
#pre-declare a matrix of windows; each column represents
#one of the possible windows of days
windows <- matrix(1:TT, nrow = TT + 1, ncol = max(TT - window + 1, 1))[1:window, ]

DT[ , {
  #not all possible windows necessarily apply to each
  #  country; subset to find only the relevant windows
  windowsj <- windows[ , 1:(uniqueN(Date) - window + 1)]
  #lapply returns a list (which can be readily assigned with :=)
  lapply(1:ncol(windowsj),
         function(ii){
           #subset to relevant rows
           .SD[windowsj[ , ii],
               #regress, extract
               lm(y ~ x1 + x2 + x3)$coefficients]})},
  by = country]

Comparing the result of this to your coefs.a and coefs.b:

    country         V1          V2         V3          V4          V5          V6
 1:       a -0.8764867  0.46169717  2.6712128  2.66304537  1.18928600  0.53553900
 2:       a -1.0135961  0.03985467  0.6015446  0.61316724  0.24177034  0.86369780
 3:       a -0.1807617 -0.25767309 -2.9492897 -3.05092528 -0.04310375  0.62317993
 4:       a -0.6664342 -0.30732907 -0.3362091 -0.25776715  1.04419854  1.02294125
 5:       b  0.9548685  0.77461810 -0.5100818 -0.57726788 -0.73285223 -1.64196684
 6:       b  0.7179429  0.46107110  0.1732915  0.23262455  0.23258149  3.63679221
 7:       b  0.1639778 -0.22249382  1.4539881  0.58725270  0.54879762 -0.27115275
 8:       b  0.6192641  0.12706750  0.2671673  0.79569434  0.69031761  2.27769679
 9:       c  0.2722200  0.07279085 -0.7709578 -0.74590575 -0.15773196  0.03178821
10:       c  0.8890314  0.74213624  0.4440650  0.34939003  0.50531166  0.16550026
11:       c  0.1589915  0.20531447  0.9931054  1.25495206 -0.01543296 -0.09887655
12:       c  0.7198967  0.70536869  0.4508445  0.02028332 -0.54705588 -0.64246579

> coefs.a
        (Intercept)          x1          x2         x3
coefs.a  -0.8764867 -1.01359605 -0.18076171 -0.6664342
          0.4616972  0.03985467 -0.25767309 -0.3073291
          2.6712128  0.60154458 -2.94928969 -0.3362091
          2.6630454  0.61316724 -3.05092528 -0.2577671
          1.1892860  0.24177034 -0.04310375  1.0441985
          0.5355390  0.86369780  0.62317993  1.0229412

(i.e. it's the same, just transposed)



来源:https://stackoverflow.com/questions/34244660/rolling-regression-with-data-table-update

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!