Say there is a 2-column data frame with a time or distance column which sequentially increases and an observation column which may have NAs here and there. How can I effici
Here are some functions that will give the same output on your first example:
partition <- function(x, window, step = 0){
a = x[x < step]
b = x[x >= step]
ia = rep(0, length(a))
ib = cut(b, seq(step, max(b) + window, by = window))
c(ia, ib)
}
roll <- function(df, window, step = 0, fun, ...){
tapply(df$measure, partition(df$time, window, step), fun, ...)
}
roll_steps <- function(df, window, steps, fun, ...){
X = lapply(steps, roll, df = df, window = window, fun = fun, ...)
names(X) = steps
X
}
Output for your first example:
> roll_steps(dat, 5, c(0, 2.5), mean)
$`0`
1 2 3 4 5
NA 1.0126639 0.9514456 NA NA
$`2.5`
0 1 2 3 4
1.0222694 NA 0.9965048 1.0518228 NA
You can also ignore missing values this way easily:
> roll_steps(dat, 5, c(0, 2.5), mean, na.rm = TRUE)
$`0`
1 2 3 4 5
0.7275438 1.0126639 0.9514456 0.9351326 NaN
$`2.5`
0 1 2 3 4
1.0222694 0.8138012 0.9965048 1.0518228 0.6122983
This can also be used for a list of data.frames:
> x = lapply(dat2, roll_steps, 5, c(0, 2.5), mean)