Cumulative sum in a window (or running window sum) based on a condition in R

前端 未结 4 699
傲寒
傲寒 2020-12-17 17:19

I am trying to calculate cumulative sum for a given window based on a condition. I have seen threads where the solution does conditional cumulative sum (Calculate a conditio

4条回答
  •  借酒劲吻你
    2020-12-17 18:15

    My solution stays on the tidyverse side of things, however, if your source data is not excessive the performance difference may not be an issue.

    I will start with declaring a function to calculate the rolling sum using tibbletime::rollify and expand the data frame to include missing FY values. Then group and summarise while applying the rolling sum.

    library(tidyr)
    library(dplyr)
    
    rollsum_5 <- tibbletime::rollify(sum, window = 5)
    
    df %>%
      complete(FY, Customer, Product) %>%
      replace_na(list(Rev = 0), Rev) %>%
      arrange(Customer, Product, FY) %>%
      group_by(Customer, Product, FY) %>%
      summarise(Rev = sum(Rev)) %>%
      mutate(cumsum = rollsum_5(Rev)) %>%
      ungroup %>%
      filter(Rev != 0)
    
    # # A tibble: 16 x 5
    #    Customer Product    FY   Rev cumsum
    #              
    #  1    13575 A        2011  4.00  NA   
    #  2    13575 A        2012  3.00  NA   
    #  3    13575 A        2013  3.00  NA   
    #  4    13575 A        2015  1.00  11.0 
    #  5    13575 A        2016  2.00   9.00
    #  6    13575 B        2011  3.00  NA   
    #  7    13575 B        2012  3.00  NA   
    #  8    13575 B        2013  4.00  NA   
    #  9    13575 B        2014  5.00  15.0 
    # 10    13575 B        2015  6.00  21.0 
    # 11    13578 A        2010  3.00  NA   
    # 12    13578 A        2016  2.00   2.00
    # 13    13578 B        2013  2.00  NA   
    # 14    13578 C        2014  4.00   4.00
    # 15    13578 D        2015  2.00   2.00
    # 16    13578 E        2010  2.00  NA 
    

    N.B. The rolling sum in this case will only appear in the rows where the window (5 rows) are intact. It could be misleading to suggest that partial values are equal to a five year sum.

提交回复
热议问题