问题
I have some data like this:
data <- tibble(a = 1:100)
a
--
1
2
3
4
5
6
7
...
etc...
Is there any elegant way to create a variable that would be a sum of n leading values? I mean something like this:
data %>% mutate(b = lead(a,1) + lead(a,2) + lead(a,3) + ... + lead(a,n))
For example, in the case of n = 2 I would get:
a b
--------------
1 2+3 = 5
2 3+4 = 7
3 4+5 = 9
4 5+6 = 11
5 6+7 = 13
6 7+8 = 15
7 8+9 = 17
...
Thanks in advance!
回答1:
We're getting dangerously close to recreating the stats::filter
function which dplyr
masks:
stats::filter(1:10, c(rep(1,2),0), sides=1)
#Time Series:
#Start = 1
#End = 10
#Frequency = 1
# [1] NA NA 5 7 9 11 13 15 17 19
Here's a little function to exactly match the output:
sumnahead <- function(x,n) {
rev(stats::filter(rev(x), c(0,rep(1,n)), sides=1))
}
sumnahead(1:10,2)
#[1] 5 7 9 11 13 15 17 19 NA NA
It's also fast because it farms out to compiled code:
system.time(sumnahead(1:1e7,50))
# user system elapsed
# 2.28 0.22 2.53
system.time(lead_n(1:1e7,50))
# user system elapsed
# 6.02 4.07 10.13
回答2:
Using a quick function to generate all the lead vectors and add them together:
lead_n = function(x, n = 1) {
leads = lapply(1:n, function(i) lead(x, i))
Reduce(`+`, leads)
}
data %>%
mutate(b = lead_n(a, 2))
Output:
a b
<int> <int>
1 1 5
2 2 7
3 3 9
4 4 11
5 5 13
6 6 15
7 7 17
8 8 19
9 9 21
10 10 23
回答3:
This is a left-aligned rolling sum offset by one. lead
by one to exclude the current value.
library(dplyr)
data <- tibble(a = 1:100)
data %>% mutate(b = lead(zoo::rollsum(a, 2, fill = NA, align = 'left')))
#> # A tibble: 100 x 2
#> a b
#> <int> <int>
#> 1 1 5
#> 2 2 7
#> 3 3 9
#> 4 4 11
#> 5 5 13
#> 6 6 15
#> 7 7 17
#> 8 8 19
#> 9 9 21
#> 10 10 23
#> # ... with 90 more rows
来源:https://stackoverflow.com/questions/49808908/dplyr-summing-n-leading-values