I posted a question here: Matched Range Merge in R about merging two files based on a number in one file falling into a range in the second file. Thus far, I have been unsuc
I believe what you're asking for is a conditional join
. They're easy in SQL, and the sqldf
package makes it easy to query data frames in R using SQL.
Just pick a version depending on how you want unmatched SNPs handled.
Inner join version:
> sqldf("select * from file1test f1 inner join file2 f2
+ on (f1.BP > f2.BP_start and f1.BP<= f2.BP_end) ")
Output:
SNP BP Gene BP_start BP_end
1 rs2343 860269 E3543 860260 879955
2 rs754 861822 E3543 860260 879955
3 rs754 861822 E11 861322 879533
4 rs854 367934 E613 367640 368634
>
Left Join version:
> sqldf("select * from file1test f1 left join file2 f2
+ on (f1.BP > f2.BP_start and f1.BP<= f2.BP_end) ")
Output:
SNP BP Gene BP_start BP_end
1 rs2343 860269 E3543 860260 879955
2 rs211 369640 NA NA
3 rs754 861822 E3543 860260 879955
4 rs754 861822 E11 861322 879533
5 rs854 367934 E613 367640 368634
6 rs343 706940 NA NA
7 rs626 717244 NA NA
>
Note that you may want to be careful where you place the =
if it matters which group a BP will fall in for the case where a BP exactly matches a BP_start or BP_end.