Say there is a 2-column data frame with a time or distance column which sequentially increases and an observation column which may have NAs here and there. How can I effici
Ok, how about this.
library(data.table)
dat <- data.table(dat)
setkey(dat, time)
# function to compute a given stat over a time window on a given data.table
window_summary <- function(start_tm, window_len, stat_fn, my_dt) {
pos_vec <- my_dt[, which(time>=start_tm & time<=start_tm+window_len)]
return(stat_fn(my_dt$measure[pos_vec]))
}
# a vector of window start times
start_vec <- seq(from=-2.5, to=dat$time[nrow(dat)], by=2.5)
# sapply'ing the function above over vector of start times
# (in this case, getting mean over 5 second windows)
result <- sapply(start_vec, window_summary,
window_len=5, stat_fn=mean, my_dt=dat)
On my machine, it processes the first 20,000 rows of your large dataset in 13.06781 secs; all rows in 51.58614 secs