Linear regression with only previous values in moving window

百般思念 提交于 2021-02-10 06:49:49

问题


I have a huge dataset and would like to perform a rolling linear regression over a window of 60. However, I want that only the 60 previous values are considered for the linear regression.

My Dataframe DF consists of following Columns:

Date          Company   Y     X1   X2
01.01.2015    Mill     0.13   -1    -3
01.02.2015    Mill     0.16   1    5 
01.03.2015    Mill     0.83   3    4
01.04.2015    Mill     -0.83  23   4
01.01.1988    Hall    0.23    1    3
01.02.1988    Hall    0.24    23   2
01.03.1988    Hall    0.78    19   -9
01.04.1988    Hall    0.73    4    12
01.05.1988    Hall    0.72    5    12
01.11.2008    Jopo    0.12    0.9  32
01.12.2008    Jopo    0.13    10   32
01.01.2009    Jopo    0.32    0.2  10
01.02.2009    Jopo    0.32    2    -1

I have several thousand companies and data for several months for each company. The regression has to be done for every month of a company, with the rolling window of 60 previous months of this specific company.

In the given example, assuming only a rolling window of 3, I want for company Mill a regression on 01.04.2015 with the data from 01.01-01.03-2015. For company Hall I want regressions on 01.04 and 01.05.1988, and for Jopo I want a regression on 01.02.2009.

Ideally, the results will be pasted together with Company and Date in a new data frame, as I have to keep working with this data and have to analyse it more.

Following code should do the trick for the rolling regression, however it does not use the previous 60 dates, but 59 and includes the current date too:

library(zoo)

rolled <- function(df) {                                    
    rollapply(df, width = 60,
        FUN = function(z) coef(lm(Y ~ X1+X2, data = as.data.frame(z))),
        by.column = FALSE, align = "right"
)
}    

Following code does the regression dependent on the company name, as I want to make regressions for each individual company, independend from the other companies.

Test <- do.call("rbind", by(DF[c("Y", "X1", "X2")], DF[c( "Name")], rolled))

How do I incorporate, that only the 60 previous values are used for the regression? And maybe someone knows how to show also "Company" and "Date" in the results? Thanks for your help!


回答1:


Assume DF is as given reproducibly in the Note at the end. Use by to split DF into company rows and apply the anonymous function using rollapplyr. Note that rollapplyr can take for the width a list argument with the offsets of the positions to use. For example, list(-seq(3)) means use the 3 prior rows (as suggested in the question) but not the current row (which would have position 0).

library(zoo)

# w <- 60    
w <- 3
Coef <- function(x) coef(lm(as.data.frame(x)))
do.call("rbind", by(DF, DF$Company, function(x) 
    cbind(x, rollapplyr(x[3:5], list(-seq(w)), Coef, fill = NA, by.column = FALSE))))

giving:

              Date Company     Y   X1 X2 (Intercept)         X1         X2
Hall.5  01.01.1988    Hall  0.23  1.0  3          NA         NA         NA
Hall.6  01.02.1988    Hall  0.24 23.0  2          NA         NA         NA
Hall.7  01.03.1988    Hall  0.78 19.0 -9          NA         NA         NA
Hall.8  01.04.1988    Hall  0.73  4.0 12     0.37711 -0.0017480 -0.0484553
Hall.9  01.05.1988    Hall  0.72  5.0 12     1.30333 -0.0433333 -0.0333333
Jopo.10 01.11.2008    Jopo  0.12  0.9 32          NA         NA         NA
Jopo.11 01.12.2008    Jopo  0.13 10.0 32          NA         NA         NA
Jopo.12 01.01.2009    Jopo  0.32  0.2 10          NA         NA         NA
Jopo.13 01.02.2009    Jopo  0.32  2.0 -1     0.41104  0.0010989 -0.0091259
Mill.1  01.01.2015    Mill  0.13 -1.0 -3          NA         NA         NA
Mill.2  01.02.2015    Mill  0.16  1.0  5          NA         NA         NA
Mill.3  01.03.2015    Mill  0.83  3.0  4          NA         NA         NA
Mill.4  01.04.2015    Mill -0.83 23.0  4     0.21611  0.2994444 -0.0711111

You could also try this:

library(broom)
fun <- function(x) unlist(tidy(lm(as.data.frame(x)))[, -1]) 
do.call("rbind", by(DF, DF$Company, function(x) 
 cbind(x, rollapplyr(x[3:5], list(-(seq(w))), fun, fill = NA, by.column = FALSE))))

which gives:

              Date Company     Y   X1 X2 estimate1    estimate2    estimate3
Hall.5  01.01.1988    Hall  0.23  1.0  3        NA           NA           NA
Hall.6  01.02.1988    Hall  0.24 23.0  2        NA           NA           NA
Hall.7  01.03.1988    Hall  0.78 19.0 -9        NA           NA           NA
Hall.8  01.04.1988    Hall  0.73  4.0 12 0.3771138 -0.001747967 -0.048455285
Hall.9  01.05.1988    Hall  0.72  5.0 12 1.3033333 -0.043333333 -0.033333333
Jopo.10 01.11.2008    Jopo  0.12  0.9 32        NA           NA           NA
Jopo.11 01.12.2008    Jopo  0.13 10.0 32        NA           NA           NA
Jopo.12 01.01.2009    Jopo  0.32  0.2 10        NA           NA           NA
Jopo.13 01.02.2009    Jopo  0.32  2.0 -1 0.4110390  0.001098901 -0.009125874
Mill.1  01.01.2015    Mill  0.13 -1.0 -3        NA           NA           NA
Mill.2  01.02.2015    Mill  0.16  1.0  5        NA           NA           NA
Mill.3  01.03.2015    Mill  0.83  3.0  4        NA           NA           NA
Mill.4  01.04.2015    Mill -0.83 23.0  4 0.2161111  0.299444444 -0.071111111
        std.error1 std.error2 std.error3 statistic1 statistic2 statistic3
Hall.5          NA         NA         NA         NA         NA         NA
Hall.6          NA         NA         NA         NA         NA         NA
Hall.7          NA         NA         NA         NA         NA         NA
Hall.8         NaN        NaN        NaN        NaN        NaN        NaN
Hall.9         NaN        NaN        NaN        NaN        NaN        NaN
Jopo.10         NA         NA         NA         NA         NA         NA
Jopo.11         NA         NA         NA         NA         NA         NA
Jopo.12         NA         NA         NA         NA         NA         NA
Jopo.13        NaN        NaN        NaN        NaN        NaN        NaN
Mill.1          NA         NA         NA         NA         NA         NA
Mill.2          NA         NA         NA         NA         NA         NA
Mill.3          NA         NA         NA         NA         NA         NA
Mill.4         NaN        NaN        NaN        NaN        NaN        NaN
        p.value1 p.value2 p.value3
Hall.5        NA       NA       NA
Hall.6        NA       NA       NA
Hall.7        NA       NA       NA
Hall.8       NaN      NaN      NaN
Hall.9       NaN      NaN      NaN
Jopo.10       NA       NA       NA
Jopo.11       NA       NA       NA
Jopo.12       NA       NA       NA
Jopo.13      NaN      NaN      NaN
Mill.1        NA       NA       NA
Mill.2        NA       NA       NA
Mill.3        NA       NA       NA
Mill.4       NaN      NaN      NaN
> 

Alternative

Another possibility is to use a width of w+1 and then remove the last component.

# w <- 60    
w <- 3 
Coef1 <- function(x) coef(lm(as.data.frame(head(x, -1))))
do.call("rbind", by(DF, DF$Company, function(x) 
    cbind(x, rollapplyr(x[3:5], w+1, Coef1, fill = NA, by.column = FALSE))))

Fewer than w+1 rows in a company

If there are companies with fewer than w+1 rows then try this. It uses the partial=TRUE argument of rollapplyr to compute lm with fewer rows and modifies Coef accordingly so that it will continue to work:

# w <- 60    
w <- 3
Coef <- function(x) coef(lm(as.data.frame(matrix(x, c(nrow(x), 1)))))
do.call("rbind", by(DF, DF$Company, function(x) cbind(x, 
  rollapplyr(x[3:5], list(-seq(w)), Coef, partial = TRUE, by.column = FALSE))))

Note: Input DF is:

Lines <- "Date          Company   Y     X1   X2
01.01.2015    Mill     0.13   -1    -3
01.02.2015    Mill     0.16   1    5 
01.03.2015    Mill     0.83   3    4
01.04.2015    Mill     -0.83  23   4
01.01.1988    Hall    0.23    1    3
01.02.1988    Hall    0.24    23   2
01.03.1988    Hall    0.78    19   -9
01.04.1988    Hall    0.73    4    12
01.05.1988    Hall    0.72    5    12
01.11.2008    Jopo    0.12    0.9  32
01.12.2008    Jopo    0.13    10   32
01.01.2009    Jopo    0.32    0.2  10
01.02.2009    Jopo    0.32    2    -1"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)


来源:https://stackoverflow.com/questions/42584748/linear-regression-with-only-previous-values-in-moving-window

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!