I have a panel with many IDs, begin
and end
dates. begin
to end
date create an interval
of time.
Here is one way to do it, using int_overlaps
from lubridate
. I have defined the intervals from the begin and end dates, although in your data they are different - perhaps you could clarify which is correct.
library(lubridate)
df$interval <- interval(as.POSIXct(df$begin),as.POSIXct(df$end))
df <- df[order(df$id),] #needs to be sorted by id for next stage to work
df$overlap <- unlist(tapply(df$interval, #loop through intervals
df$id, #grouped by id
function(x) rowSums(outer(x,x,int_overlaps))>1))
#check if more than one overlap in subset for that id
df
id begin end interval overlap
1 1 2010-01-31 2011-06-30 2010-01-31 UTC--2011-06-30 UTC TRUE
2 1 2011-01-31 2012-06-30 2011-01-31 UTC--2012-06-30 UTC TRUE
3 1 2012-01-31 2013-06-30 2012-01-31 UTC--2013-06-30 UTC TRUE
4 1 2013-01-31 2014-06-30 2013-01-31 UTC--2014-06-30 UTC TRUE
5 1 2013-02-28 2013-07-31 2013-02-28 UTC--2013-07-31 UTC TRUE
6 1 2015-02-28 2015-03-31 2015-02-28 UTC--2015-03-31 UTC FALSE
7 1 2015-06-30 2015-07-31 2015-06-30 UTC--2015-07-31 UTC FALSE
8 1 2015-09-30 2016-01-31 2015-09-30 UTC--2016-01-31 UTC FALSE
9 2 2010-01-31 2011-06-30 2010-01-31 UTC--2011-06-30 UTC TRUE
10 2 2011-01-31 2012-06-30 2011-01-31 UTC--2012-06-30 UTC TRUE
11 2 2012-01-31 2013-06-30 2012-01-31 UTC--2013-06-30 UTC TRUE
12 2 2013-01-31 2014-06-30 2013-01-31 UTC--2014-06-30 UTC TRUE
13 2 2013-02-28 2013-07-31 2013-02-28 UTC--2013-07-31 UTC TRUE
14 2 2015-02-28 2015-03-31 2015-02-28 UTC--2015-03-31 UTC FALSE
15 2 2015-06-30 2015-07-31 2015-06-30 UTC--2015-07-31 UTC FALSE
16 2 2015-09-30 2016-01-31 2015-09-30 UTC--2016-01-31 UTC FALSE