I\'m curious if anyone out there can come up with a (faster) way to calculate rolling statistics (rolling mean, median, percentiles, etc.) over a variable interval of time (
Rcpp is a good approach if speed is your primary concern. I'll use the rolling mean statistic to explain by example.
Benchmarks: Rcpp versus R
x = sort(runif(25000,0,4*pi))
y = sin(x) + rnorm(length(x),0.5,0.5)
system.time( rollmean_r(x,y,xout=x,width=1.1) ) # ~60 seconds
system.time( rollmean_cpp(x,y,xout=x,width=1.1) ) # ~0.0007 seconds
Code for Rcpp and R function
cppFunction('
NumericVector rollmean_cpp( NumericVector x, NumericVector y,
NumericVector xout, double width) {
double total=0;
unsigned int n=x.size(), nout=xout.size(), i, ledge=0, redge=0;
NumericVector out(nout);
for( i=0; i width && ledge= (xout[i]-width) & x <= (xout[i]+width)
out[i] = .Internal(mean( y[window] ))
}
return(out)
}
Now for an explantion of rollmean_cpp. x and y are the data. xout is a vector of points at which the rolling statistic is requested. width is the width*2 of the rolling window. Note that the indeces for the ends of sliding window are stored in ledge and redge. These are essentially pointers to the respective elements in x and y. These indeces could be very beneficial for calling other C++ functions (e.g., median and the like) that take a vector and starting and ending indeces as input.
For those who want a "verbose" version of rollmean_cpp for debugging (lengthy):
cppFunction('
NumericVector rollmean_cpp( NumericVector x, NumericVector y,
NumericVector xout, double width) {
double total=0, oldtotal=0;
unsigned int n=x.size(), nout=xout.size(), i, ledge=0, redge=0;
NumericVector out(nout);
for( i=0; i width && ledge