Rolling sum on an unbalanced time series

橙三吉。 提交于 2019-12-24 09:58:50

问题


I have a series of annual incident counts per category, with no rows for years in which the category did not see an incident. I would like to add a column that shows, for each year, how many incidents occurred in the previous three years.

One way to handle this is to add empty rows for all years with zero incidents, then use rollapply() with a left-aligned four year window, but that would expand my data set more than I want to. Surely there's a way to use ddply() and transform for this?

The following two lines of code build a dummy data set, then execute a simple plyr sum by category:

dat <- data.frame(
   category=c(rep('A',6), rep('B',6), rep('C',6)), 
   year=rep(c(2000,2001,2004,2005,2009, 2010),3), 
   incidents=rpois(18, 3)
   )

ddply(dat, .(category) , transform, i_per_c=sum(incidents) )

That works, but it only shows a per-category total.

I want a total that's year-dependent.

So I try to expand the ddply() call with the function() syntax, like so:

ddply(dat, .(category) , transform, 
      function(x) i_per_c=sum(ifelse(x$year >= year - 4 & x$year < year,  x$incidents, 0) )
      )

This just returns the original data frame, unmodified.

I must be missing something in the plyr syntax, but I don't know what it is.

Thanks, Matt


回答1:


This is sorta ugly, but it works. Nested ply calls:

ddply(dat, .(category), 
    function(datc) adply(datc, 1, 
         function(x) data.frame(run_incidents =
                                sum(subset(datc, year>(x$year-2) & year<=x$year)$incidents))))

There might be a slightly cleaner way to do it, and there are definitely ways that execute much faster.



来源:https://stackoverflow.com/questions/8947952/rolling-sum-on-an-unbalanced-time-series

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!