I suspect this is a somewhat simple question with multiple solutions, but I'm still a bit of a novice in R and an exhaustive search didn't yield answers that spoke well to what I'm wanting to do.
I'm trying to create, for lack of better term, "moving sums" for a variable in my data frame. These would be 3-year and 5-year sums, lagged one year. So, a 5-year sum for an observation in 1986 would be the sum of all previous observations in 1981, 1982, 1983, 1984, and 1985. Here is an example of what I would like to do, where the sum variable is the sum of all x
in the five years prior to the observation year.
country year x x5yrsum
A 1980 9 NA
A 1981 3 NA
A 1982 5 NA
A 1983 6 NA
A 1984 9 NA
A 1985 7 32
A 1986 9 30
A 1987 4 36
.....................
B 1990 0 NA
B 1991 4 NA
B 1992 2 NA
B 1993 6 NA
B 1994 3 NA
B 1995 7 15
B 1996 0 22
This is unbalanced panel data. I suspect ddply
would be appropriate, but I wouldn't know the exact coding for it.
Any input would be appreciated.
You can use filter
in ddply
(or any other function implementing the "split-apply-combine" approach):
library(plyr)
ddply(DF, .(country), transform,
x5yrsum2 = as.numeric(filter(x,c(0,rep(1,5)),sides=1)))
# country year x x5yrsum x5yrsum2
# 1 A 1980 9 NA NA
# 2 A 1981 3 NA NA
# 3 A 1982 5 NA NA
# 4 A 1983 6 NA NA
# 5 A 1984 9 NA NA
# 6 A 1985 7 32 32
# 7 A 1986 9 30 30
# 8 A 1987 4 36 36
# 9 B 1990 0 NA NA
# 10 B 1991 4 NA NA
# 11 B 1992 2 NA NA
# 12 B 1993 6 NA NA
# 13 B 1994 3 NA NA
# 14 B 1995 7 15 15
# 15 B 1996 0 22 22
If DF
is the input three-column data frame then use ave
with rollapplyr
from zoo. Note that we use a width of k+1
and then drop the k+1st element from the sum so that the current value of x
is excluded and only the remaining k
values are summed:
library(zoo)
k <- 5
roll <- function(x) rollapplyr(x, k+1, function(x) sum(x[-k-1]), fill = NA)
transform(DF, xSyrsum = ave(x, country, FUN = roll))
which gives:
country year x xSyrsum
1 A 1980 9 NA
2 A 1981 3 NA
3 A 1982 5 NA
4 A 1983 6 NA
5 A 1984 9 NA
6 A 1985 7 32
7 A 1986 9 30
8 A 1987 4 36
9 B 1990 0 NA
10 B 1991 4 NA
11 B 1992 2 NA
12 B 1993 6 NA
13 B 1994 3 NA
14 B 1995 7 15
15 B 1996 0 22
you can also use filter
of standard packages (stats
) to do moving sum:
ms=function(x,n=5) as.numeric(stats::filter(x,rep(1, n),method="convolution",sides=1))
x=c(1,2,3,4,5,6,7,8,9)
ms(x,5)
NA NA NA NA 15 20 25 30 35
To do a 1-lag, insert NA
at the begining and take the number of elements (or lines):
ms.1lag=c(NA,ms(x,5))[1:length(x)]
cbind(x,ms.1lag)
x ms.1lag
[1,] 1 NA
[2,] 2 NA
[3,] 3 NA
[4,] 4 NA
[5,] 5 NA
[6,] 6 15
[7,] 7 20
[8,] 8 25
[9,] 9 30
If you use this frequently,
ms=function(x,n=5,lag=0)
c(rep(NA,lag),
as.numeric(stats::filter(x,rep(1, n),method="convolution",sides=1)))[1:length(x)]
cbind(x,ms5.1=ms(x,5,1))
x ms5.1
[1,] 1 NA
[2,] 2 NA
[3,] 3 NA
[4,] 4 NA
[5,] 5 NA
[6,] 6 15
[7,] 7 20
[8,] 8 25
[9,] 9 30
来源:https://stackoverflow.com/questions/17573226/generating-a-moving-sum-variable-in-r