问题
I have a list of marked individuals (column Mark) which have been captured various years (column Year) within a range of the river (LocStart and LocEnd). Location on the river is in meters.
I would like to know if a marked individual has used overlapping range between years i.e. if the individual has gone to the same segment of the river from year to year.
Here is an example of the original data set:
ID
MarkYearLocStartLocEnd
11081199221,72922,229
21081199221,20321,703
31081200521,50822,008
41126199419,22219,522
51126199418,81119,311
61283200521,75422,254
71283200722,02522,525
Here is what I would like the final answer to look like:
Mark
Year1Year2IDs
1081199220051, 3
1081199220052, 3
1283200520076, 7
In this case, individual 1126 would not be in the final output as the only two ranges available were the same year. I realize it would be easy to remove all the records where Year1 = Year2.
I would like to do this in R and have looked into the >IRanges package but have not been able to consider the group = Mark and been able to extract the Year1 and Year2 information.
回答1:
Using foverlaps() function from data.table package:
require(data.table)
setkey(setDT(dt), Mark, LocStart, LocEnd) ## (1)
olaps = foverlaps(dt, dt, type="any", which=TRUE) ## (2)
olaps = olaps[dt$Year[xid] != dt$Year[yid]] ## (3)
olaps[, `:=`(Mark = dt$Mark[xid],
Year1 = dt$Year[xid],
Year2 = dt$Year[yid],
xid = dt$ID[xid],
yid = dt$ID[yid])] ## (4)
olaps = olaps[xid < yid] ## (5)
# xid yid Mark Year1 Year2
# 1: 2 3 1081 1992 2005
# 2: 1 3 1081 1992 2005
# 3: 6 7 1283 2005 2007
We first convert the data.frame to data.table by reference using
setDT. Then, we key the data.table on columnsMark,LocStartandLocEnd, which will allow us to perform overlapping range joins.We calculate self overlaps (
dtwith itself) with any type of overlap. But we return matching indices here usingwhich = TRUE.Remove all indices where
Yearcorresponding toxidandyidare identical.Add all the other columns and replace
xidandyidwith correspondingIDvalues, by reference.Remove all indices where
xid>=yid. If row 1 overlaps with row 3, then row 3 also overlaps with row 1. We don't need both.foverlaps()doesn't have a way to remove this by default yet.
来源:https://stackoverflow.com/questions/28019283/range-overlap-intersect-by-group-and-between-years