Count of unique values in a rolling date range for R

人盡茶涼 提交于 2019-11-30 14:57:09

Here's something that works, taking advantage of the new non-equijoins feature of data.table.

dt[dt[ , .(date3=date, date2 = date - 2, email)], 
   on = .(date >= date2, date<=date3), 
   allow.cartesian = TRUE
   ][ , .(count = uniqueN(email)), 
      by = .(date = date + 2)]
#          date V1
# 1: 2011-12-30  3
# 2: 2011-12-31  3
# 3: 2012-01-01  3
# 4: 2012-01-02  3
# 5: 2012-01-03  1
# 6: 2012-01-04  2

To be honest I'm a bit miffed on how this is working exactly, but the idea is to join dt to itself on date, matching any date that is between 2 days ago and today. I'm not sure why we have to clean up by setting date = date + 2 afterwards.


Here's an approach using keys:

setkey(dt, date)

dt[ , .(count = dt[.(seq.Date(.BY$date - 2L, .BY$date, "day")),
                   uniqueN(email), nomatch = 0L]), by = date]

With the recently implemented non-equi joins feature in the current development version of data.table, v1.9.7, this can be done as follows:

dt[.(date3=unique(dt$date2)), .(count=uniqueN(email)), on=.(date>=date3, date2<=date3), by=.EACHI]
#          date      date2 count
# 1: 2011-12-30 2011-12-30     3
# 2: 2011-12-31 2011-12-31     3
# 3: 2012-01-01 2012-01-01     3
# 4: 2012-01-02 2012-01-02     3
# 5: 2012-01-03 2012-01-03     1
# 6: 2012-01-04 2012-01-04     2
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!