How do I do a conditional sum which only looks between certain date criteria

前端未结

关注

 7  468

遥遥无期 2020-12-18 13:47

Say I have data that looks like

date, user, items_bought, event_number
2013-01-01, x, 2, 1
2013-01-02, x, 1, 2
2013-01-03, x, 0, 3
2013-01-04, x, 0, 4
2013-0


      
      
        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   别那么骄傲
                                             
                
                
                (楼主)
            
              
              
                2020-12-18 13:59
              

            
            
                        
I'd like to propose an additional data.table approach combined with zoo package rollapplyr function

First, we will aggregate items_bought column per user per unique date (as you pointed out that there could be more than one unique date per user)

library(data.table)
data <- setDT(data)[, lapply(.SD, sum), by = c("user", "date"), .SDcols = "items_bought"]


Next, we will compute rollapplyr combined with sum and partial = TRUE in order to cover up for margins (thanks for the advice @G. Grothendieck) in 3 days intervals

library(zoo)
data[, cum_items_bought_3_days := lapply(.SD, rollapplyr, 3, sum, partial = TRUE), .SDcols = "items_bought", by = user]

#     user       date items_bought cum_items_bought_3_days
#  1:    x 2013-01-01            2                       2
#  2:    x 2013-01-02            1                       3
#  3:    x 2013-01-03            0                       3
#  4:    x 2013-01-04            0                       1
#  5:    x 2013-01-05            3                       3
#  6:    x 2013-01-06            1                       4
#  7:    y 2013-01-01            1                       1
#  8:    y 2013-01-02            1                       2
#  9:    y 2013-01-03            0                       2
# 10:    y 2013-01-04            5                       6
# 11:    y 2013-01-05            6                      11
# 12:    y 2013-01-06            1                      12


This is the data set I've used

data <- structure(list(date = structure(c(15706, 15707, 15708, 15709, 15710, 15711, 15706, 15707, 15708, 15709, 15710, 15711), class = "Date"), user = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(" x", " y"), class = "factor"), items_bought = c(2L, 1L, 0L, 0L, 3L, 1L, 1L, 1L, 0L, 5L, 6L, 1L)), .Names = c("date", "user", "items_bought"), row.names = c(NA, -12L), class = "data.frame")

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它7个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复