问题
I\'m trying to use R to calculate the moving average over a series of values in a matrix. The normal R mailing list search hasn\'t been very helpful though. There doesn\'t seem to be a built-in function in R will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?
回答1:
- Rolling Means/Maximums/Medians in the zoo package (rollmean)
- MovingAverages in TTR
- ma in forecast
回答2:
Or you can simply calculate it using filter, here's the function I use:
ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}
If you use dplyr
, be careful to specify stats::filter
in the function above.
回答3:
Using cumsum
should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbers
cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n
As pointed out in the comments by @mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from @Ricardo Cruz:
cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn
This still has the issue that if all the values in the window are NAs then there will be a division by zero error.
回答4:
In data.table 1.12.0 new frollmean
function has been added to compute fast and exact rolling mean carefully handling NA
, NaN
and +Inf
, -Inf
values.
As there is no reproducible example in the question there is not much more to address here.
You can find more info about ?frollmean
in manual, also available online at ?frollmean.
Examples from manual below:
library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three above are embarrassingly parallel using openmp
回答5:
The caTools
package has very fast rolling mean/min/max/sd and few other functions. I've only worked with runmean
and runsd
and they are the fastest of any of the other packages mentioned to date.
回答6:
You could use RcppRoll
for very quick moving averages written in C++. Just call the roll_mean
function. Docs can be found here.
Otherwise, this (slower) for loop should do the trick:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n):i])
}
res
}
回答7:
In fact RcppRoll
is very good.
The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:
ma <- function(arr, n=15){
res = arr
for(i in n:length(arr)){
res[i] = mean(arr[(i-n+1):i])
}
res
}
Another way, which handles missings, is given here.
A third way, improving cantdutchthis code to calculate partial averages or not, follows:
ma <- function(x, n=2,parcial=TRUE){
res = x #set the first values
if (parcial==TRUE){
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res
}else{
for(i in 1:length(x)){
t<-max(i-n+1,1)
res[i] = mean(x[t:i])
}
res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
}
}
回答8:
In order to complement the answer of cantdutchthis and Rodrigo Remedio;
moving_fun <- function(x, w, FUN, ...) {
# x: a double vector
# w: the length of the window, i.e., the section of the vector selected to apply FUN
# FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
# Given a double type vector apply a FUN over a moving window from left to the right,
# when a window boundary is not a legal section, i.e. lower_bound and i (upper bound)
# are not contained in the length of the vector, return a NA_real_
if (w < 1) {
stop("The length of the window 'w' must be greater than 0")
}
output <- x
for (i in 1:length(x)) {
# plus 1 because the index is inclusive with the upper_bound 'i'
lower_bound <- i - w + 1
if (lower_bound < 1) {
output[i] <- NA_real_
} else {
output[i] <- FUN(x[lower_bound:i, ...])
}
}
output
}
# example
v <- seq(1:10)
# compute a MA(2)
moving_fun(v, 2, mean)
# compute moving sum of two periods
moving_fun(v, 2, sum)
回答9:
One can use runner package for moving functions. In this case mean_run
function. Problem with cummean
is that it doesn't handle NA
values, but mean_run
does:
library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
mean_run(x1)
#> [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#> [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809
mean_run(x2, na_rm = TRUE)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202
mean_run(x2, na_rm = FALSE )
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.12188853 -0.13873536
#> [7] NA NA NA NA NA NA
#> [13] NA NA NA
mean_run(x2, na_rm = TRUE, k = 4)
#> [1] -0.18760011 -0.09022066 -0.06543317 0.03906450 -0.10546063 -0.16299272
#> [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684 0.01103493
#> [13] 0.09609256 0.09738460 0.04740283
One can also specify other options like k
window length,lag
, and roll within date window. More in package and function documentation.
回答10:
Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.
reqd_ma <- rollapply(x, FUN = mean, width = n)
where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.
来源:https://stackoverflow.com/questions/743812/calculating-moving-average