Conditional rolling mean (moving average) on irregular time series

后端 未结 2 2005
萌比男神i
萌比男神i 2020-12-09 12:54

I have a group of data in the format:

ID    Minutes Value
xxxx  118     3 
xxxx  121     4 
xxxx  122     3 
yyyy  122     6 
xxxx  123     4 
yyyy  123              


        
相关标签:
2条回答
  • 2020-12-09 13:13

    An alternative approach that uses tidyr/dplyr instead of data.table and RcppRoll instead of zoo:

    library(dplyr)
    library(tidyr)
    library(RcppRoll)
    
    d %>% 
      group_by(ID) %>%
      # add rows for unosberved minutes
      complete(Minutes = full_seq(Minutes, 1)) %>%
      # RcppRoll::roll_mean() is written in C++ for speed 
      mutate(moving_mean = roll_mean(Value, 131, fill = NA, na.rm = TRUE)) %>%
      # keep only the rows that were originally observed
      filter(!is.na(Value))
    

    data

    d <- data_frame(
      ID = rep(1:3, each = 5),
      Minutes = rep(c(1, 30, 60, 120, 200), 3),
      Value = rpois(15, lambda = 10)
    )
    
    0 讨论(0)
  • 2020-12-09 13:24

    You can easily fill in the missing Minutes (Value will be set to NA), then use rollapply

    library(data.table)
    library(zoo)
    
    ## Convert to data.table
    DT <- data.table(DF, key=c("IDs", "Minutes"))
    
    ## Missing Minutes will be added in. Value will be set to NA. 
    DT <- DT[CJ(unique(IDs), seq(min(Minutes), max(Minutes)))]
    
    ## Run your function
    DT[, rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
    

    Alternatively, you don't need to keep the 'padded' Minutes / NA Values:

    You can do it all in one shot:

    ## Convert your DF to a data.able
    DT <- data.table(DF, key=c("IDs", "Minutes"))
    
    ## Compute rolling means, with on-the-fly padded minutes
    DT[ CJ(unique(IDs), seq(min(Minutes), max(Minutes))) ][, 
      rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
    
    0 讨论(0)
提交回复
热议问题