问题
Summary (tldr)
I need to perform a rolling regression on an irregular time series (i.e. the interval may not even be periodic and go from 0, 1, 2, 3...
to ...7, 20, 24, 28...
) that's simple numeric and does not necessarily require date/time, but the rolling window needs be by time. So if I have a timeseries that is irregularly sampled for 600 seconds and the window is 30, the regression is performed every 30 seconds, and not every 30 samples.
I've read examples, and while I could replicate doing rolling sums and medians by time, I can't seem to figure it out for regression.
The problem
First of all, I have read some of the other questions with regards to performing rolling functions on irregular time series data, such as this: optimized rolling functions on irregular time series with time-based window, and this: Rolling window over irregular time series.
The issue is that the examples provided, so far, are simple for equations like sum
or median
, but I have not yet figured out how to perform a simple rolling regression, i.e. using lm
, that is still based on the same caveat that the window is based on an irregular time series. Also, my timeseries is much, much simpler; no date is necessary, it's simply time "elapsed".
Anyway, getting this right is important to me because with irregular time - for example, a skip in the time interval - may give an over- or underestimate of the coefficients in the rolling regression, as the sample window will include additional time.
So I was wondering if anyone can help me with creating a function that does this in the simplest way? The dataset is based on measuring a variable over time i.e. 2 variables: time, and response. Time is measured every x time elapsed units (seconds, minutes, so not date/time formatted), but once in a while it becomes irregular.
For every row in the function, it should perform a linear regression based on a width of n time units. The width should never exceed n units, but may be floored (i.e. reduced) to accomodate irregular time sampling. So for example, if the width is specified at 20 seconds, but time is sampled every 6 seconds, then the window will be rounded to 18, not 24 seconds.
I have looked at the question here: How to calculate the average slope within a moving window in R, and I tested that code on an irregular time series, but it looks like it's based on regular time series.
Sample data:
sample <-
structure(list(x = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 47, 48,
49), y = c(50, 49, 48, 47, 46, 47, 46, 45, 44, 43, 44, 43, 42,
41, 40, 41, 40, 39, 38, 37, 38, 37, 36, 35, 34, 35, 34, 33, 32,
31, 30, 29, 28, 29, 28, 27, 26, 25, 26, 25, 24, 23, 22, 21, 20,
19)), .Names = c("x", "y"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -46L))
My current code (based on a previous question I referred to). I know it's not subsetting by time:
library(zoo)
clm <- function(z) coef(lm(y ~ x, as.data.frame(z)))
rollme <- rollapplyr(zoo(sample), 10, clm, by.column = F, fill = NA)
The expected output (manually calculated) is below. The output is different from a regular rolling regression -- the numbers are different as soon as the time interval skips at 29 (secs):
NA
NA
NA
NA
NA
NA
NA
NA
NA
-0.696969697
-0.6
-0.551515152
-0.551515152
-0.6
-0.696969697
-0.6
-0.551515152
-0.551515152
-0.6
-0.696969697
-0.6
-0.551515152
-0.551515152
-0.6
-0.696969697
-0.6
-0.551515152
-0.551515152
-0.6
-0.696969697
-0.605042017
-0.638888889
-0.716981132
-0.597560976
-0.528301887
-0.5
-0.521008403
-0.642857143
-0.566666667
-0.551515152
-0.551515152
-0.6
-0.696969697
-0.605042017
-0.638888889
-0.716981132
I hope I'm providing enough information, but let me know (or give me a guide to a good example somewhere) for me to try this?
Other things I have tried: I've tried converting the time to POSIXct format but I don't know how to perform lm on that:
require(lubridate)
x <- as.POSIXct(strptime(sample$x, format = "%S"))
Update : Added tldr section.
回答1:
Try this:
# time interval is 1
sz=10
pl2=list()
for ( i in 1:nrow(sample)){
if (i<sz) period=sz else
period=length(sample$x[sample$x>(sample$x[i]-sz) & sample$x<=sample$x[i]])-1
pl2[[i]]=seq(-period,0)
}
#update for time interval > 1
sz=10
tint=1
pl2=list()
for ( i in 1:nrow(sample)){
if (i<sz) period=sz else
period=length(sample$x[sample$x>(sample$x[i]-sz*tint) & sample$x<=sample$x[i]])-1
pl2[[i]]=seq(-period,0)
}
rollme3 <- rollapplyr(zoo(sample), pl2, clm, by.column = F, fill = NA)
> tail(rollme3)
(Intercept) x
41 47.38182 -0.5515152
42 49.20000 -0.6000000
43 53.03030 -0.6969697
44 49.26050 -0.6050420
45 50.72222 -0.6388889
46 54.22642 -0.7169811
回答2:
For the sake of completeness, here is an answer which uses data.table to aggregate in a non-equi join.
Although there many similar questions, e.g., r calculating rolling average with window based on value (not number of rows or date/time variable), this question deserves an answer on its own as the OP is looking for the coefficients of a rolling regression.
library(data.table)
ws <- 10 # size of sliding window in time units
setDT(sample)[.(start = x - ws, end = x), on = .(x > start, x <= end),
as.list(coef(lm(y ~ x.x))), by = .EACHI]
x x (Intercept) x.x 1: -10 0 50.00000 NA 2: -9 1 50.00000 -1.0000000 3: -8 2 50.00000 -1.0000000 4: -7 3 50.00000 -1.0000000 5: -6 4 50.00000 -1.0000000 6: -5 5 49.61905 -0.7142857 7: -4 6 49.50000 -0.6428571 8: -3 7 49.50000 -0.6428571 9: -2 8 49.55556 -0.6666667 10: -1 9 49.63636 -0.6969697 11: 0 10 49.20000 -0.6000000 12: 1 11 48.88485 -0.5515152 13: 2 12 48.83636 -0.5515152 14: 3 13 49.20000 -0.6000000 15: 4 14 50.12121 -0.6969697 16: 5 15 49.20000 -0.6000000 17: 6 16 48.64242 -0.5515152 18: 7 17 48.59394 -0.5515152 19: 8 18 49.20000 -0.6000000 20: 9 19 50.60606 -0.6969697 21: 10 20 49.20000 -0.6000000 22: 11 21 48.40000 -0.5515152 23: 12 22 48.35152 -0.5515152 24: 13 23 49.20000 -0.6000000 25: 14 24 51.09091 -0.6969697 26: 15 25 49.20000 -0.6000000 27: 16 26 48.15758 -0.5515152 28: 17 27 48.10909 -0.5515152 29: 18 28 49.20000 -0.6000000 30: 19 29 51.57576 -0.6969697 31: 22 32 49.18487 -0.6050420 32: 23 33 50.13889 -0.6388889 33: 24 34 52.47170 -0.7169811 34: 25 35 48.97561 -0.5975610 35: 26 36 46.77358 -0.5283019 36: 27 37 45.75000 -0.5000000 37: 28 38 46.34454 -0.5210084 38: 29 39 50.57143 -0.6428571 39: 30 40 47.95556 -0.5666667 40: 31 41 47.43030 -0.5515152 41: 32 42 47.38182 -0.5515152 42: 33 43 49.20000 -0.6000000 43: 34 44 53.03030 -0.6969697 44: 37 47 49.26050 -0.6050420 45: 38 48 50.72222 -0.6388889 46: 39 49 54.22642 -0.7169811 x x (Intercept) x.x
Please note that rows 10 to 30 where the time series is regularly spaced are identical to OP's rollme
.
The call to as.list()
forces the result of coef(lm(...))
to appear in separate columns.
The code above uses a right aligned rolling window. However, the code can be easily adapted to support a left aligned window as well:
# left aligned window
setDT(sample)[.(start = x, end = x + ws), on = .(x >= start, x < end),
as.list(coef(lm(y ~ x.x))), by = .EACHI]
来源:https://stackoverflow.com/questions/46860333/rolling-regression-on-irregular-time-series