dplyr: grouping and summarizing/mutating data with rolling time windows

后端 未结 5 1649
一整个雨季
一整个雨季 2020-12-16 22:12

I have irregular timeseries data representing a certain type of transaction for users. Each line of data is timestamped and represents a transaction at that time. By the i

5条回答
  •  我在风中等你
    2020-12-16 23:00

    I found a way to do this while working on this question

    df <- data.frame(
      id = c(1, 1, 1, 1, 1, 2, 2, 3, 4),
      date = c("2015-01-01", 
               "2015-01-01", 
               "2015-01-05", 
               "2015-01-25",
               "2015-02-15",
               "2015-05-05", 
               "2015-01-01", 
               "2015-08-01", 
               "2015-01-01"),
      n_widgets = c(1,2,3,4,4,5,2,4,5)
    )
    
    count_window <- function(df, date2, w, id2){
      min_date <- date2 - w
      df2 <- df %>% filter(id == id2, date >= min_date, date <= date2)
      out <- length(df2$date)
      return(out)
    }
    v_count_window <- Vectorize(count_window, vectorize.args = c("date2","id2"))
    
    sum_window <- function(df, date2, w, id2){
      min_date <- date2 - w
      df2 <- df %>% filter(id == id2, date >= min_date, date <= date2)
      out <- sum(df2$n_widgets)
      return(out)
    }
    v_sum_window <- Vectorize(sum_window, vectorize.args = c("date2","id2"))
    
    res <- df %>% mutate(date = ymd(date)) %>% 
      mutate(min_date = date - 30,
             n_trans = v_count_window(., date, 30, id),
             total_widgets = v_sum_window(., date, 30, id)) %>% 
      select(id, date, n_widgets, n_trans, total_widgets)
    res
    
    
    id       date n_widgets n_trans total_widgets
    
    1  1 2015-01-01         1       2             3
    2  1 2015-01-01         2       2             3
    3  1 2015-01-05         3       3             6
    4  1 2015-01-25         4       4            10
    5  1 2015-02-15         4       2             8
    6  2 2015-05-05         5       1             5
    7  2 2015-01-01         2       1             2
    8  3 2015-08-01         4       1             4
    9  4 2015-01-01         5       1             5
    

    This version is fairly case specific but you could probably make a version of the functions that is more general.

提交回复
热议问题