I\'m curious if anyone out there can come up with a (faster) way to calculate rolling statistics (rolling mean, median, percentiles, etc.) over a variable interval of time (
Let's see... you are doing a loop( very slow in R), making unnecessary copies of data in creating subset, and using rbind to accumulate you data set. If you avoid those, things will speed up considerably. Try this...
Summary_Stats <- function(Day, dataframe, interval){
c1 <- dataframe$Date > Day - interval/2 &
dataframe$Date < Day + interval/2
c(
as.numeric(Day),
mean(dataframe$Price[c1]),
median(dataframe$Price[c1]),
sum(c1),
quantile(dataframe$Price[c1], 0.25),
quantile(dataframe$Price[c1], 0.75)
)
}
Summary_Stats(df$Date[2],dataframe=df, interval=20)
firstDay <- min(df$Date)
lastDay <- max(df$Date)
system.time({
x <- sapply(firstDay:lastDay, Summary_Stats, dataframe=df, interval=20)
x <- as.data.frame(t(x))
names(x) <- c("Date","Average","Median","Count","P25","P75")
x$Date <- as.Date(x$Date)
})
dim(x)
head(x)