dplyr: grouping and summarizing/mutating data with rolling time windows

后端 未结 5 1642
一整个雨季
一整个雨季 2020-12-16 22:12

I have irregular timeseries data representing a certain type of transaction for users. Each line of data is timestamped and represents a transaction at that time. By the i

5条回答
  •  甜味超标
    2020-12-16 23:05

    This can be done using SQL:

    library(sqldf)
    
    dd <- transform(data, date = as.Date(date))
    sqldf("select a.*, count(*) n_trans30, sum(b.n_widgets) 'total_widgets30' 
           from dd a 
           left join dd b on b.date between a.date - 30 and a.date 
                             and b.id = a.id
                             and b.rowid <= a.rowid
           group by a.rowid")
    

    giving:

      id       date n_widgets n_trans30 total_widgets30
    1  1 2015-01-01         1         1               1
    2  1 2015-01-01         2         2               3
    3  1 2015-01-05         3         3               6
    4  1 2015-01-25         4         4              10
    5  2 2015-05-05         5         1               5
    6  2 2015-01-01         2         1               2
    7  3 2015-08-01         4         1               4
    8  4 2015-01-01         5         1               5
    

提交回复
热议问题