How do I do a conditional sum which only looks between certain date criteria

前端 未结 7 472
遥遥无期
遥遥无期 2020-12-18 13:47

Say I have data that looks like

date, user, items_bought, event_number
2013-01-01, x, 2, 1
2013-01-02, x, 1, 2
2013-01-03, x, 0, 3
2013-01-04, x, 0, 4
2013-0         


        
7条回答
  •  情书的邮戳
    2020-12-18 13:58

    It seems like packages xts and zoo contain functions that do what you want, although you may have the same problems with the size of your actual dataset as with @alexis_laz answer. Using the functions from the xts answer to this question seem to do the trick.

    First I took the code from the answer I link to above and made sure it worked for just one user. I include the apply.daily function because I believe from your edits/comments that you have multiple observations for some days for some users - I added an extra line to the toy dataset to reflect this.

    # Make dataset with two observations for one date for "y" user
    dat <- structure(list(
        date = structure(c(15706, 15707, 15708, 15709, 15710, 15711, 
            15706, 15707, 15708, 15709, 15710, 15711, 15711), class = "Date"), 
        user = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "y", "y", "y"),
        items_bought = c(2L, 1L, 0L, 0L, 3L, 1L, 1L, 1L, 0L, 5L, 6L, 1L, 0L)),
        .Names = c("date", "user", "items_bought"),
        row.names = c(NA, -13L),
        class = "data.frame")
    
    # Load xts package (also loads zoo)
    require(xts)
    
    # See if this works for one user
    dat1 = subset(dat, user == "y")
    # Create "xts" object for use with apply.daily()
    dat1.1 = xts(dat1$items_bought, dat1$date)
    dat2 = apply.daily(dat1.1, sum)
    # Now use rollapply with a 3-day window
    # The "partial" argument appears to only work with zoo objects, not xts
    sum.itemsbought = rollapply(zoo(dat2), 3, sum, align = "right", partial = TRUE)
    

    I thought the output could look nicer (more like example output from your question). I haven't worked with zoo objects much, but the answer to this question gave me some pointers for putting the info into a data.frame.

    data.frame(Date=time(sum.itemsbought), sum.itemsbought, row.names=NULL)
    

    Once I had this worked out for one user, it was straightforward to expand this to the entire toy dataset. This is where speed could become an issue. I use lapply and do.call for this step.

    allusers = lapply(unique(dat$user), function(x) {
        dat1 = dat[dat$user == x,]
        dat1.1 = xts(dat1$items_bought, dat1$date)
        dat2 = apply.daily(dat1.1, sum)
        sum.itemsbought = rollapply(zoo(dat2), 3, sum, align = "right", partial = TRUE)
        data.frame(Date=time(sum.itemsbought), user = x, sum.itemsbought, row.names=NULL)
    } )
    do.call(rbind, allusers)
    

提交回复
热议问题