Find point-to-range overlaps

后端 未结 3 1364
你的背包
你的背包 2021-01-22 19:18

I have a dataframe df1:

df1 <- read.table(text=\" Chr06  79641   
Chr06   82862   
Chr06   387314  
Chr06   656098  
Chr06   678491  
Chr06   1018696\", heade         


        
3条回答
  •  抹茶落季
    2021-01-22 20:04

    Using GenomicRanges:

    #Convert to Granges objects
    gr1 <- GRanges(seqnames = df1$V1,
                   ranges = IRanges(df1$V2, df1$V2))
    
    gr2 <- GRanges(seqnames = df2$V1,
                   ranges = IRanges(df2$V2, df2$V3))
    #Subset gr1
    subsetByOverlaps(gr1, gr2)
    
    # GRanges object with 3 ranges and 0 metadata columns:
    #       seqnames             ranges strand
    #                      
    #  [1]    Chr06 [  82862,   82862]      *
    #  [2]    Chr06 [ 387314,  387314]      *
    #  [3]    Chr06 [1018696, 1018696]      *
    #   -------
    #   seqinfo: 1 sequence from an unspecified genome; no seqlengths
    
    #Or we can use merge
    mergeByOverlaps(gr1, gr2)
    
    # DataFrame with 3 rows and 2 columns
    #                          gr1                        gr2
    #                                      
    # 1 Chr06:*:[  82862,   82862] Chr06:*:[  79720,   87043]
    # 2 Chr06:*:[ 387314,  387314] Chr06:*:[ 387314,  387371]
    # 3 Chr06:*:[1018696, 1018696] Chr06:*:[1018676, 1018736]
    

    Also, look into bedtools:

    Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

提交回复
热议问题