I am trying to calculate cumulative sum for a given window based on a condition. I have seen threads where the solution does conditional cumulative sum (Calculate a conditio
My solution stays on the tidyverse side of things, however, if your source data is not excessive the performance difference may not be an issue.
I will start with declaring a function to calculate the rolling sum using tibbletime::rollify and expand the data frame to include missing FY values. Then group and summarise while applying the rolling sum.
library(tidyr)
library(dplyr)
rollsum_5 <- tibbletime::rollify(sum, window = 5)
df %>%
complete(FY, Customer, Product) %>%
replace_na(list(Rev = 0), Rev) %>%
arrange(Customer, Product, FY) %>%
group_by(Customer, Product, FY) %>%
summarise(Rev = sum(Rev)) %>%
mutate(cumsum = rollsum_5(Rev)) %>%
ungroup %>%
filter(Rev != 0)
# # A tibble: 16 x 5
# Customer Product FY Rev cumsum
#
# 1 13575 A 2011 4.00 NA
# 2 13575 A 2012 3.00 NA
# 3 13575 A 2013 3.00 NA
# 4 13575 A 2015 1.00 11.0
# 5 13575 A 2016 2.00 9.00
# 6 13575 B 2011 3.00 NA
# 7 13575 B 2012 3.00 NA
# 8 13575 B 2013 4.00 NA
# 9 13575 B 2014 5.00 15.0
# 10 13575 B 2015 6.00 21.0
# 11 13578 A 2010 3.00 NA
# 12 13578 A 2016 2.00 2.00
# 13 13578 B 2013 2.00 NA
# 14 13578 C 2014 4.00 4.00
# 15 13578 D 2015 2.00 2.00
# 16 13578 E 2010 2.00 NA
N.B. The rolling sum in this case will only appear in the rows where the window (5 rows) are intact. It could be misleading to suggest that partial values are equal to a five year sum.