dplyr: Summing n leading values

ε祈祈猫儿з 提交于 2021-01-28 00:32:05

问题


I have some data like this:

data <- tibble(a = 1:100)

a
--
1
2
3
4
5
6
7
...

etc...

Is there any elegant way to create a variable that would be a sum of n leading values? I mean something like this:

data %>% mutate(b = lead(a,1) + lead(a,2) + lead(a,3) + ... + lead(a,n))

For example, in the case of n = 2 I would get:

a      b
--------------
1    2+3 = 5
2    3+4 = 7
3    4+5 = 9
4    5+6 = 11
5    6+7 = 13
6    7+8 = 15
7    8+9 = 17
...

Thanks in advance!


回答1:


We're getting dangerously close to recreating the stats::filter function which dplyr masks:

stats::filter(1:10, c(rep(1,2),0), sides=1)
#Time Series:
#Start = 1 
#End = 10 
#Frequency = 1 
# [1] NA NA  5  7  9 11 13 15 17 19

Here's a little function to exactly match the output:

sumnahead <- function(x,n) {
  rev(stats::filter(rev(x), c(0,rep(1,n)), sides=1))
}

sumnahead(1:10,2)
#[1]  5  7  9 11 13 15 17 19 NA NA

It's also fast because it farms out to compiled code:

system.time(sumnahead(1:1e7,50))
#   user  system elapsed 
#   2.28    0.22    2.53 
system.time(lead_n(1:1e7,50))
#   user  system elapsed 
#   6.02    4.07   10.13 



回答2:


Using a quick function to generate all the lead vectors and add them together:

lead_n = function(x, n = 1) {
    leads = lapply(1:n, function(i) lead(x, i))
    Reduce(`+`, leads)
}
data %>%
    mutate(b = lead_n(a, 2))

Output:

      a     b
   <int> <int>
 1     1     5
 2     2     7
 3     3     9
 4     4    11
 5     5    13
 6     6    15
 7     7    17
 8     8    19
 9     9    21
10    10    23



回答3:


This is a left-aligned rolling sum offset by one. lead by one to exclude the current value.

library(dplyr)

data <- tibble(a = 1:100)

data %>% mutate(b = lead(zoo::rollsum(a, 2, fill = NA, align = 'left')))
#> # A tibble: 100 x 2
#>        a     b
#>    <int> <int>
#>  1     1     5
#>  2     2     7
#>  3     3     9
#>  4     4    11
#>  5     5    13
#>  6     6    15
#>  7     7    17
#>  8     8    19
#>  9     9    21
#> 10    10    23
#> # ... with 90 more rows


来源:https://stackoverflow.com/questions/49808908/dplyr-summing-n-leading-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!