I have a dataframe df1:
df1 <- read.table(text=\" Chr06 79641
Chr06 82862
Chr06 387314
Chr06 656098
Chr06 678491
Chr06 1018696\", heade
Using GenomicRanges:
#Convert to Granges objects
gr1 <- GRanges(seqnames = df1$V1,
ranges = IRanges(df1$V2, df1$V2))
gr2 <- GRanges(seqnames = df2$V1,
ranges = IRanges(df2$V2, df2$V3))
#Subset gr1
subsetByOverlaps(gr1, gr2)
# GRanges object with 3 ranges and 0 metadata columns:
# seqnames ranges strand
#
# [1] Chr06 [ 82862, 82862] *
# [2] Chr06 [ 387314, 387314] *
# [3] Chr06 [1018696, 1018696] *
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths
#Or we can use merge
mergeByOverlaps(gr1, gr2)
# DataFrame with 3 rows and 2 columns
# gr1 gr2
#
# 1 Chr06:*:[ 82862, 82862] Chr06:*:[ 79720, 87043]
# 2 Chr06:*:[ 387314, 387314] Chr06:*:[ 387314, 387371]
# 3 Chr06:*:[1018696, 1018696] Chr06:*:[1018676, 1018736]
Also, look into bedtools:
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.