R overlap multiple GRanges with findOverlaps()

主宰稳场 提交于 2019-12-21 17:54:18

问题


I have three tables with differing genomic intervals. Here is an example:

> a
   chr interval.start interval.end names
1 chr1              5           10     a
2 chr1              6           10     b
3 chr2              7           10     c
4 chr3              8           10     d

> b
   chr interval.start interval.end names
1 chr1              6           15     e
2 chr1              7           15     f
3 chr1              8           15     g

> c
   chr interval.start interval.end names
1 chr1              7           12     h
2 chr1              8           12     i
3 chr5              9           12     j
4 chr10             10          12     k
5 chr20             11          12     l

I am trying to find the common intervals between all tables after converting info to GRanges. Essentially I want to do something like intersect(c,intersect(a,b)). However, because I am using genomic coordinates, I have to do this with GRanges and GenomicRanges package, which I am not familiar with.

I can do findOverlaps(gr, gr1) or findOverlaps(gr1, gr2), but is there an easy way to overlap multiple GRanges at once like findOverlaps(gr, gr1, gr2)?

Any help would be appreciated. If this question was asked elsewhere, I apologize in advance.

Thanks


回答1:


You can subset one of them using the subsetByOverlaps result of one pairwise comparison then use that subset to compare to the third set.

Sub1 <- subsetByOverlaps(gr,gr1)
Sub2 <- subsetByOverlaps(sub1,gr2)

Or directly

Reduce(subsetByOverlaps, list(gr, gr1, gr2))

resulting in the subset of the GRanges object that overlap in all 3 GRanges objects

Depending on the type of overlap you want and which has the largest ranges, you should consider which to use as the query and which the subject.




回答2:


Following works for getting the exact intersects between all the ranges.

Reduce(intersect, list(gr, gr1, gr2))

In:

Reduce(subsetByOverlaps, list(gr, gr1, gr2))

subsetByOverlaps takes the first granges object as the query (first object in parentheses, here gr) and returns the coordiantes in the query (gr) that overlaps with at least one element in the subjects (gr1, gr2). So to find common intervals (regions of intersection), intersect is a the appropriate function.



来源:https://stackoverflow.com/questions/23331475/r-overlap-multiple-granges-with-findoverlaps

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!