问题
I have a list of marked individuals (column Mark) which have been captured various years (column Year) within a range of the river (LocStart and LocEnd). Location on the river is in meters.
I would like to know if a marked individual has used overlapping range between years i.e. if the individual has gone to the same segment of the river from year to year.
Here is an example of the original data set:
ID
Mark
Year
LocStart
LocEnd
11081
1992
21,729
22,229
21081
1992
21,203
21,703
31081
2005
21,508
22,008
41126
1994
19,222
19,522
51126
1994
18,811
19,311
61283
2005
21,754
22,254
71283
2007
22,025
22,525
Here is what I would like the final answer to look like:
Mark
Year1
Year2
IDs
10811992
2005
1, 3
10811992
2005
2, 3
12832005
2007
6, 7
In this case, individual 1126 would not be in the final output as the only two ranges available were the same year. I realize it would be easy to remove all the records where Year1 = Year2.
I would like to do this in R and have looked into the >IRanges package but have not been able to consider the group = Mark and been able to extract the Year1 and Year2 information.
回答1:
Using foverlaps()
function from data.table
package:
require(data.table)
setkey(setDT(dt), Mark, LocStart, LocEnd) ## (1)
olaps = foverlaps(dt, dt, type="any", which=TRUE) ## (2)
olaps = olaps[dt$Year[xid] != dt$Year[yid]] ## (3)
olaps[, `:=`(Mark = dt$Mark[xid],
Year1 = dt$Year[xid],
Year2 = dt$Year[yid],
xid = dt$ID[xid],
yid = dt$ID[yid])] ## (4)
olaps = olaps[xid < yid] ## (5)
# xid yid Mark Year1 Year2
# 1: 2 3 1081 1992 2005
# 2: 1 3 1081 1992 2005
# 3: 6 7 1283 2005 2007
We first convert the data.frame to data.table by reference using
setDT
. Then, we key the data.table on columnsMark
,LocStart
andLocEnd
, which will allow us to perform overlapping range joins.We calculate self overlaps (
dt
with itself) with any type of overlap. But we return matching indices here usingwhich = TRUE
.Remove all indices where
Year
corresponding toxid
andyid
are identical.Add all the other columns and replace
xid
andyid
with correspondingID
values, by reference.Remove all indices where
xid
>=yid
. If row 1 overlaps with row 3, then row 3 also overlaps with row 1. We don't need both.foverlaps()
doesn't have a way to remove this by default yet.
来源:https://stackoverflow.com/questions/28019283/range-overlap-intersect-by-group-and-between-years