R: cumulative sum over rolling date range

后端 未结 3 1099
庸人自扰
庸人自扰 2020-12-14 12:21

In R, how can I calculate cumsum for a defined time period prior to the row being calculate? Prefer dplyr if possible.

For example, if the period was 10 days, then

相关标签:
3条回答
  • 2020-12-14 13:07

    I recommend using runner package designed to calculate functions on rolling/running windows. You can achieve this by using sum_run - one liner here:

    library(runner)
    library(dplyr)
    
    df %>%
      mutate(
        cum_rolling_10 = sum_run(
          x = df$value, 
          k = 10, 
          idx = as.Date(df$date, format = "%d/%m/%Y"))
      )
    
    
    df
    
    #          date value cum_rolling_10
    # 1   1/01/2000     9              9
    # 2   2/01/2000     1             10
    # 3   5/01/2000     9             19
    # 4   6/01/2000     3             22
    # 5   7/01/2000     4             26
    # 6   8/01/2000     3             29
    # 7  13/01/2000    10             29
    # 8  14/01/2000     9             38
    # 9  18/01/2000     2             21
    # 10 19/01/2000     9             30
    # 11 21/01/2000     8             38
    # 12 25/01/2000     5             24
    # 13 26/01/2000     1             25
    # 14 30/01/2000     6             20
    # 15 31/01/2000     6             18
    

    Enjoy!

    0 讨论(0)
  • 2020-12-14 13:10

    A solution using dplyr, tidyr, lubridate, and zoo.

    library(dplyr)
    library(tidyr)
    library(lubridate)
    library(zoo)
    
    dt2 <- dt %>%
      mutate(date = dmy(date)) %>%
      mutate(cumsum = cumsum(value)) %>%
      complete(date = full_seq(date, period = 1), fill = list(value = 0)) %>%
      mutate(cum_rolling10 = rollapplyr(value, width = 10, FUN = sum, partial = TRUE)) %>%
      drop_na(cumsum)
    dt2
    # A tibble: 15 x 4
             date value cumsum cum_rolling10
           <date> <dbl>  <int>         <dbl>
     1 2000-01-01     9      9             9
     2 2000-01-02     1     10            10
     3 2000-01-05     9     19            19
     4 2000-01-06     3     22            22
     5 2000-01-07     4     26            26
     6 2000-01-08     3     29            29
     7 2000-01-13    10     39            29
     8 2000-01-14     9     48            38
     9 2000-01-18     2     50            21
    10 2000-01-19     9     59            30
    11 2000-01-21     8     67            38
    12 2000-01-25     5     72            24
    13 2000-01-26     1     73            25
    14 2000-01-30     6     79            20
    15 2000-01-31     6     85            18
    

    DATA

    dt <- structure(list(date = c("1/01/2000", "2/01/2000", "5/01/2000", 
    "6/01/2000", "7/01/2000", "8/01/2000", "13/01/2000", "14/01/2000", 
    "18/01/2000", "19/01/2000", "21/01/2000", "25/01/2000", "26/01/2000", 
    "30/01/2000", "31/01/2000"), value = c(9L, 1L, 9L, 3L, 4L, 3L, 
    10L, 9L, 2L, 9L, 8L, 5L, 1L, 6L, 6L)), .Names = c("date", "value"
    ), row.names = c(NA, -15L), class = "data.frame")
    
    0 讨论(0)
  • 2020-12-14 13:24

    this solution will avoid memory overhead, and migrate to sparklyr will be easy.

    lag = 7

        dt %>%
      mutate(date = dmy(date)) %>%
      mutate(order = datediff(date,min(date)) %>% 
      arrange(desc(order)) %>% 
      mutate(n_order = lag(order + lag,1L,default = 0)) %>% 
      mutate(b_order = ifelse(order - n_order >= 0,order,-1)) %>% 
      mutate(m_order = cummax(b_order)) %>% 
      group_by(m_order) %>% 
      mutate(rolling_value = cumsum(value))
    
    0 讨论(0)
提交回复
热议问题