问题
zoo::rollmean
is a helpful function that returns the rolling mean of a time series; for vector x
of length n
and window size k
it returns the vector c(mean(x[1:k]), mean(x[2:(k+1)]), ..., mean(x[(n-k+1):n]))
.
I noticed that it seemed to be running slowly for some code I was developing, so I wrote my own version using the Rcpp package and a simple for loop:
library(Rcpp)
cppFunction("NumericVector rmRcpp(NumericVector dat, const int window) {
const int n = dat.size();
NumericVector ret(n-window+1);
double summed = 0.0;
for (int i=0; i < window; ++i) {
summed += dat[i];
}
ret[0] = summed / window;
for (int i=window; i < n; ++i) {
summed += dat[i] - dat[i-window];
ret[i-window+1] = summed / window;
}
return ret;
}")
To my surprise, this version of the function is much faster than the zoo::rollmean
function:
# Time series with 1000 elements
set.seed(144)
y <- rnorm(1000)
x <- 1:1000
library(zoo)
zoo.dat <- zoo(y, x)
# Make sure our function works
all.equal(as.numeric(rollmean(zoo.dat, 3)), rmRcpp(y, 3))
# [1] TRUE
# Benchmark
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), rmRcpp(y, 3))
# Unit: microseconds
# expr min lq mean median uq max neval
# rollmean(zoo.dat, 3) 685.494 904.7525 1776.88666 1229.2475 1744.0720 15724.321 100
# rmRcpp(y, 3) 6.638 12.5865 46.41735 19.7245 27.4715 2418.709 100
The speedup holds even for much larger vectors:
# Time series with 5 million elements
set.seed(144)
y <- rnorm(5000000)
x <- 1:5000000
library(zoo)
zoo.dat <- zoo(y, x)
# Make sure our function works
all.equal(as.numeric(rollmean(zoo.dat, 3)), rmRcpp(y, 3))
# [1] TRUE
# Benchmark
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), rmRcpp(y, 3), times=10)
# Unit: milliseconds
# expr min lq mean median uq max
# rollmean(zoo.dat, 3) 2825.01622 3090.84353 3191.87945 3206.00357 3318.98129 3616.14047
# rmRcpp(y, 3) 31.03014 39.13862 42.67216 41.55567 46.35191 53.01875
Why does a simple Rcpp
implementation run ~100x faster than zoo::rollmean
?
回答1:
Poking around in zoo it seem that the rollmean.*
methods are all in implemented in R.
Whereas you implemented one in C++. The packaged R code probably also does a few more checks etc pp so maybe it is not so surprising that you beat it?
回答2:
Thanks to @DirkEddelbuettel for pointing out that the comparison made in the question wasn't the most fair because I was comparing C++ code to pure R code. The following is a simple base R implementation (without all the checks from the zoo package); this is quite similar to how zoo::rollmean
implements the core computation for the rolling mean:
baseR.rollmean <- function(dat, window) {
n <- length(dat)
y <- dat[window:n] - dat[c(1, 1:(n-window))]
y[1] <- sum(dat[1:window])
return(cumsum(y) / window)
}
Comparing to zoo:rollmean
, we see that this is still a good deal faster:
set.seed(144)
y <- rnorm(1000000)
x <- 1:1000000
library(zoo)
zoo.dat <- zoo(y, x)
all.equal(as.numeric(rollmean(zoo.dat, 3)), baseR.rollmean(y, 3), RcppRoll::roll_mean(y, 3), rmRcpp(y, 3))
# [1] TRUE
library(microbenchmark)
microbenchmark(rollmean(zoo.dat, 3), baseR.rollmean(y, 3), RcppRoll::roll_mean(y, 3), rmRcpp(y, 3), times=10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# rollmean(zoo.dat, 3) 507.124679 516.671897 646.813716 563.897005 593.861499 1220.08272 10
# baseR.rollmean(y, 3) 46.156480 47.804786 53.923974 49.250144 55.061844 76.47908 10
# RcppRoll::roll_mean(y, 3) 7.714032 8.513042 9.014886 8.693255 8.885514 11.32817 10
# rmRcpp(y, 3) 7.729959 8.045270 8.924030 8.388931 8.996384 12.49042 10
To delve into why we were seeing a 10x speedup while using base R, I used Hadley's lineprof tool, grabbing source code from the zoo
package source where needed:
lineprof(rollmean.zoo(zoo.dat, 3))
# time alloc release dups ref src
# 1 0.001 0.954 0 26 #27 rollmean.zoo/unclass
# 2 0.001 0.954 0 0 #28 rollmean.zoo/:
# 3 0.002 0.954 0 1 #28 rollmean.zoo
# 4 0.001 1.431 0 0 #28 rollmean.zoo/seq_len
# 5 0.001 0.000 0 0 #28 rollmean.zoo/c
# 6 0.006 2.386 0 1 #28 rollmean.zoo
# 7 0.002 0.954 0 2 #31 rollmean.zoo/cumsum
# 8 0.001 0.000 0 0 #31 rollmean.zoo//
# 9 0.005 1.912 0 1 #33 rollmean.zoo
# 10 0.013 2.898 0 14 #33 rollmean.zoo/[<-
# 11 0.299 28.941 0 127 #34 rollmean.zoo/na.fill
Clearly almost all the time is being spent in the na.fill
function, which is actually called after the rolling mean values have already been computed.
lineprof(na.fill.zoo(zoo.dat, fill=NA, 2:999999))
# time alloc release dups ref src
# 1 0.004 1.913 0 39 #26 na.fill.zoo/seq
# 2 0.002 1.921 0 9 #33 na.fill.zoo/coredata
# 3 0.002 1.921 0 6 #37 na.fill.zoo/[<-
# 4 0.001 0.955 0 10 #46 na.fill.zoo
# 5 0.008 3.838 0 19 #46 na.fill.zoo/[<-
# 6 0.003 0.959 0 2 #52 na.fill.zoo
# 7 0.006 0.972 0 21 #52 na.fill.zoo/[<-
# 8 0.001 0.486 0 0 #57 na.fill.zoo/seq_len
# 9 0.005 0.959 0 6 #66 na.fill.zoo
# 10 0.124 11.573 0 34 #66 na.fill.zoo/[
Almost all the time is being spent subsetting the zoo
object:
lineprof("[.zoo"(zoo.dat, 2:999999))
# time alloc release dups ref src
# 1 0.004 0.004 0 0 character(0)
# 2 0.002 1.922 0 4 #4 [.zoo/coredata
# 3 0.038 11.082 0 29 #19 [.zoo/zoo
# 4 0.004 0.000 0 1 #28 [.zoo
Almost all the time subsetting is spent constructing a new zoo object with the zoo
function:
lineprof(zoo(y[2:999999], 2:999999))
# time alloc release dups ref src
# 1 0.021 4.395 0 8 c("zoo", "unique") zoo/unique
# 2 0.012 0.477 0 8 c("zoo", "ORDER") zoo/ORDER
# 3 0.001 0.477 0 1 "zoo" zoo
# 4 0.001 0.954 0 0 c("zoo", ":") zoo/:
# 5 0.015 3.341 0 5 "zoo" zoo
Various operations needed to setup a new zoo object (e.g. determining unique time points and ordering them).
In conclusion, the zoo
package appears to have added a lot of overhead to its rolling mean operations by constructing a new zoo object instead of using the internals of the current zoo object; this creates a 10x slowdown compared to a base R implementation and a 100x slowdown compared to an Rcpp implementation.
来源:https://stackoverflow.com/questions/30090336/why-is-zoorollmean-slow-compared-to-a-simple-rcpp-implementation