Finding overlapping ranges between two interval data

前端 未结 2 675
陌清茗
陌清茗 2020-12-03 22:47

I have one table with coordinates (start, end) of ca. 500000 fragments and another table with 60000 single coordinates that I would like to match w

相关标签:
2条回答
  • 2020-12-03 23:22

    Does this work? You can use merge first and then subset

       kk<-merge(dtFrags,dtCoords,by="chr",all.x=TRUE)
    > kk
       chr id.x start end   type id.y coord
    1:   1    1   100 200   exon   10   150
    2:   2    2   300 500 intron   20   300
    3:   2    4   250 600   exon   20   300
    4:   X    3   400 600 intron   NA    NA
    
    
     kk[coord>=start & coord<=end]
       chr id.x start end type id.y coord
    1:   1    1   100 200 exon   10   150
    2:   2    4   250 600 exon   20   300
    
    0 讨论(0)
  • 2020-12-03 23:29

    In general, it's very appropriate to use the bioconductor package IRanges to deal with problems related to intervals. It does so efficiently by implementing interval tree. GenomicRanges is another package that builds on top of IRanges, specifically for handling, well, "Genomic Ranges".

    require(GenomicRanges)
    gr1 = with(dtFrags, GRanges(Rle(factor(chr, 
              levels=c("1", "2", "X", "Y"))), IRanges(start, end)))
    gr2 = with(dtCoords, GRanges(Rle(factor(chr, 
              levels=c("1", "2", "X", "Y"))), IRanges(coord, coord)))
    olaps = findOverlaps(gr2, gr1)
    dtCoords[, grp := seq_len(nrow(dtCoords))]
    dtFrags[subjectHits(olaps), grp := queryHits(olaps)]
    setkey(dtCoords, grp)
    setkey(dtFrags, grp)
    dtFrags[, list(grp, id, type)][dtCoords]
    
       grp id   type id.1 chr coord
    1:   1  1   exon   10   1   150
    2:   2  2 intron   20   2   300
    3:   2  4   exon   20   2   300
    4:   3 NA     NA   30   Y   500
    
    0 讨论(0)
提交回复
热议问题