How do I do a conditional sum which only looks between certain date criteria

前端 未结 7 468
遥遥无期
遥遥无期 2020-12-18 13:47

Say I have data that looks like

date, user, items_bought, event_number
2013-01-01, x, 2, 1
2013-01-02, x, 1, 2
2013-01-03, x, 0, 3
2013-01-04, x, 0, 4
2013-0         


        
7条回答
  •  别那么骄傲
    2020-12-18 13:59

    I'd like to propose an additional data.table approach combined with zoo package rollapplyr function

    First, we will aggregate items_bought column per user per unique date (as you pointed out that there could be more than one unique date per user)

    library(data.table)
    data <- setDT(data)[, lapply(.SD, sum), by = c("user", "date"), .SDcols = "items_bought"]
    

    Next, we will compute rollapplyr combined with sum and partial = TRUE in order to cover up for margins (thanks for the advice @G. Grothendieck) in 3 days intervals

    library(zoo)
    data[, cum_items_bought_3_days := lapply(.SD, rollapplyr, 3, sum, partial = TRUE), .SDcols = "items_bought", by = user]
    
    #     user       date items_bought cum_items_bought_3_days
    #  1:    x 2013-01-01            2                       2
    #  2:    x 2013-01-02            1                       3
    #  3:    x 2013-01-03            0                       3
    #  4:    x 2013-01-04            0                       1
    #  5:    x 2013-01-05            3                       3
    #  6:    x 2013-01-06            1                       4
    #  7:    y 2013-01-01            1                       1
    #  8:    y 2013-01-02            1                       2
    #  9:    y 2013-01-03            0                       2
    # 10:    y 2013-01-04            5                       6
    # 11:    y 2013-01-05            6                      11
    # 12:    y 2013-01-06            1                      12
    

    This is the data set I've used

    data <- structure(list(date = structure(c(15706, 15707, 15708, 15709, 15710, 15711, 15706, 15707, 15708, 15709, 15710, 15711), class = "Date"), user = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(" x", " y"), class = "factor"), items_bought = c(2L, 1L, 0L, 0L, 3L, 1L, 1L, 1L, 0L, 5L, 6L, 1L)), .Names = c("date", "user", "items_bought"), row.names = c(NA, -12L), class = "data.frame")
    

提交回复
热议问题