R - Faster Way to Calculate Rolling Statistics Over a Variable Interval

后端 未结 4 1454
不知归路
不知归路 2020-12-03 02:00

I\'m curious if anyone out there can come up with a (faster) way to calculate rolling statistics (rolling mean, median, percentiles, etc.) over a variable interval of time (

4条回答
  •  难免孤独
    2020-12-03 02:49

    Let's see... you are doing a loop( very slow in R), making unnecessary copies of data in creating subset, and using rbind to accumulate you data set. If you avoid those, things will speed up considerably. Try this...

    Summary_Stats <- function(Day, dataframe, interval){
        c1 <- dataframe$Date > Day - interval/2 & 
            dataframe$Date < Day + interval/2
        c(
            as.numeric(Day),
            mean(dataframe$Price[c1]),
            median(dataframe$Price[c1]),
            sum(c1),
            quantile(dataframe$Price[c1], 0.25),
            quantile(dataframe$Price[c1], 0.75)
          )
    }
    Summary_Stats(df$Date[2],dataframe=df, interval=20)
    firstDay <- min(df$Date)
    lastDay  <- max(df$Date)
    system.time({
        x <- sapply(firstDay:lastDay, Summary_Stats, dataframe=df, interval=20)
        x <- as.data.frame(t(x))
        names(x) <- c("Date","Average","Median","Count","P25","P75")
        x$Date <- as.Date(x$Date)
    })
    dim(x)
    head(x)
    

提交回复
热议问题