Aggregate results by date intervals in R

依然范特西╮ 提交于 2021-01-29 08:27:58

问题


I'm using R and I have my data on data.tables objects. My data is of the format ID, Date1, Date2, Row

For each ID I can have more than one entry, and the two dates define a time interval.

I want to be able to aggregate all the entries by id and overlapping time intervals. I do know how to do it with for loops and such, but I wonder if there is a better way.

Example:

data = data.table(
    id = c(1,1,1,2,2,3,3),
    Row = c(1,2,3,4,5,6,7),
    Date1 = c("2018-01-01", 
               "2018-01-05",
                "2018-01-21",
                "2018-01-01",
                "2018-01-15",
                "2018-01-01",
                "2018-01-19"),
    Date2 = c("2018-01-10", 
               "2018-01-20",
                "2018-01-22",
                "2018-01-31",
                "2018-01-19",
                "2018-01-15",
                "2018-01-23"))

The desired output would be something that identifies the following groups of rows: ((1,2),(3),(4,5),(6),(7)) , so that I can generate a new ID, based on this grouping.


回答1:


Referencing How to flatten / merge overlapping time periods and adding group number recursively:

s <- 0L
data[, g := {
        r <- s + c(0L, cumsum(shift(Date1, -1L) > cummax(as.integer(Date2)))[-.N])
        s <- r[.N] + 1L
        r
    }, by=.(id)]

output:

   id Row      Date1      Date2 g
1:  1   1 2018-01-01 2018-01-10 0
2:  1   2 2018-01-05 2018-01-20 0
3:  1   3 2018-01-21 2018-01-22 1
4:  2   4 2018-01-01 2018-01-31 2
5:  2   5 2018-01-15 2018-01-19 2
6:  3   6 2018-01-01 2018-01-15 3
7:  3   7 2018-01-19 2018-01-23 4

data:

library(data.table)
data = data.table(
    id = c(1,1,1,2,2,3,3),
    Row = c(1,2,3,4,5,6,7),
    Date1 = c("2018-01-01","2018-01-05","2018-01-21","2018-01-01","2018-01-15","2018-01-01","2018-01-19"),
    Date2 = c("2018-01-10","2018-01-20","2018-01-22","2018-01-31","2018-01-19","2018-01-15","2018-01-23"))
cols <- c("Date1", "Date2")
data[, (cols) := lapply(.SD, as.Date, format="%Y-%m-%d"), .SDcols=cols]


来源:https://stackoverflow.com/questions/57371970/aggregate-results-by-date-intervals-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!