dplyr: grouping and summarizing/mutating data with rolling time windows

后端 未结 5 1644
一整个雨季
一整个雨季 2020-12-16 22:12

I have irregular timeseries data representing a certain type of transaction for users. Each line of data is timestamped and represents a transaction at that time. By the i

5条回答
  •  旧巷少年郎
    2020-12-16 22:47

    For simplicity reasons I recommend runner package which handles sliding window operations. In OP request window size k = 30 and windows depend on date idx = date. You can use runner function which applies any R function on given window, and sum_run

    library(runner)
    library(dplyr)
    
    df %>%
      group_by(id) %>%
      arrange(date, .by_group = TRUE) %>%
      mutate(
        n_trans30 = runner(n_widgets, k = 30, idx = date, function(x) length(x)),
        n_widgets30 = sum_run(n_widgets, k = 30, idx = date),
      )
    
    # id      date       n_widgets n_trans30 n_widgets30
    #                        
    # 1    2015-01-01         1         1           1
    # 1    2015-01-01         2         2           3
    # 1    2015-01-05         3         3           6
    # 1    2015-01-25         4         4          10
    # 1    2015-02-15         4         2           8
    # 2    2015-01-01         2         1           2
    # 2    2015-05-05         5         1           5
    # 3    2015-08-01         4         1           4
    # 4    2015-01-01         5         1           5
    

    Important: idx = date should be in ascending order.

    For more go to documentation and vignettes

提交回复
热议问题