Finding overlapping ranges between two interval data

前端 未结 2 680
陌清茗
陌清茗 2020-12-03 22:47

I have one table with coordinates (start, end) of ca. 500000 fragments and another table with 60000 single coordinates that I would like to match w

2条回答
  •  不知归路
    2020-12-03 23:29

    In general, it's very appropriate to use the bioconductor package IRanges to deal with problems related to intervals. It does so efficiently by implementing interval tree. GenomicRanges is another package that builds on top of IRanges, specifically for handling, well, "Genomic Ranges".

    require(GenomicRanges)
    gr1 = with(dtFrags, GRanges(Rle(factor(chr, 
              levels=c("1", "2", "X", "Y"))), IRanges(start, end)))
    gr2 = with(dtCoords, GRanges(Rle(factor(chr, 
              levels=c("1", "2", "X", "Y"))), IRanges(coord, coord)))
    olaps = findOverlaps(gr2, gr1)
    dtCoords[, grp := seq_len(nrow(dtCoords))]
    dtFrags[subjectHits(olaps), grp := queryHits(olaps)]
    setkey(dtCoords, grp)
    setkey(dtFrags, grp)
    dtFrags[, list(grp, id, type)][dtCoords]
    
       grp id   type id.1 chr coord
    1:   1  1   exon   10   1   150
    2:   2  2 intron   20   2   300
    3:   2  4   exon   20   2   300
    4:   3 NA     NA   30   Y   500
    

提交回复
热议问题