I am trying to extract interesting statistics for an irregular time series data set, but coming up short on finding the right tools for the job. The tools for manipulating
As of version v1.9.8 (on CRAN 25 Nov 2016), data.table has gained the ability to aggregate in a non-equi join which can be used to apply a rolling function on a sliding time window of an irregular time series.
For demonstration and verification, a smaller dataset is used.
library(data.table) # development version 1.11.9 used
# create small dataset
set.seed(0)
nSamples <- 10
vecDT <- rexp(nSamples, 3)
vecTimes <- cumsum(c(0,vecDT))
vecVals <- 0:nSamples
vec <- data.table(vecTimes, vecVals)
vec
vecTimes vecVals 1: 0.00000000 0 2: 0.06134553 1 3: 0.10991444 2 4: 0.15651286 3 5: 0.30186907 4 6: 1.26685858 5 7: 1.67671260 6 8: 1.85660688 7 9: 2.17546271 8 10: 2.22447804 9 11: 2.68805641 10
# define window size in seconds
win_sec = 0.3
# aggregate in sliding window by a non-equi join
vec[.(t = vecTimes, upper = vecTimes + win_sec, lower = vecTimes - win_sec),
on = .(vecTimes < upper, vecTimes > lower),
.(t, .N, sliding_mean = mean(vecVals)), by = .EACHI]
vecTimes vecTimes t N sliding_mean 1: 0.3000000 -0.300000000 0.00000000 4 1.5 2: 0.3613455 -0.238654473 0.06134553 5 2.0 3: 0.4099144 -0.190085564 0.10991444 5 2.0 4: 0.4565129 -0.143487143 0.15651286 5 2.0 5: 0.6018691 0.001869065 0.30186907 4 2.5 6: 1.5668586 0.966858578 1.26685858 1 5.0 7: 1.9767126 1.376712596 1.67671260 2 6.5 8: 2.1566069 1.556606875 1.85660688 2 6.5 9: 2.4754627 1.875462707 2.17546271 2 8.5 10: 2.5244780 1.924478037 2.22447804 2 8.5 11: 2.9880564 2.388056413 2.68805641 1 10.0
The first two columns show the upper and lower bounds of the time intervall, resp., t is the original vecTimes, and N denotes the number of data points included in the calculation of the sliding mean.