Compute rolling sum by id variables, with missing timepoints

后端 未结 4 1944
终归单人心
终归单人心 2020-12-14 03:50

I\'m trying to learn R and there are a few things I\'ve done for 10+ years in SAS that I cannot quite figure out the best way to do in R. Take this data:

 id         


        
4条回答
  •  甜味超标
    2020-12-14 03:57

    A farily efficient answer to this problem could be found using the data.table library.

    ##Utilize the data.table package
    library("data.table")
    data <- data.table(t,class,id,count,desired)[order(id,class)]
    
    ##Assign each customer an ID
    data[,Cust_No:=.GRP,by=c("id","class")]
    
    ##Create "list" of comparison dates and values
    Ref <- data[,list(Compare_Value=list(I(count)),Compare_Date=list(I(t))), by=c("id","class")]
    
    ##Compare two lists and see of the compare date is within N days
    data$Roll.Val <- mapply(FUN = function(RD, NUM) {
      d <- as.numeric(Ref$Compare_Date[[NUM]] - RD)
      sum((d <= 0 & d >= -124)*Ref$Compare_Value[[NUM]])
    }, RD = data$t,NUM=data$Cust_No)
    
    ##Print out data
    data <- data[,list(id,class,t,count,desired,Roll.Val)][order(id,class)]
    data
    
    id class          t count desired Roll.Val
    1:  1     A 2010-01-15     1       1        1
    2:  1     A 2010-02-15     2       3        3
    3:  1     B 2010-04-15     3       3        3
    4:  1     B 2010-09-15     4       4        4
    5:  2     A 2010-01-15     5       5        5
    6:  2     B 2010-06-15     6       6        6
    7:  2     B 2010-08-15     7      13       13
    8:  2     B 2010-09-15     8      21       21
    

提交回复
热议问题