Merge by Range in R - Applying Loops

前端 未结 3 1069
失恋的感觉
失恋的感觉 2020-11-28 12:59

I posted a question here: Matched Range Merge in R about merging two files based on a number in one file falling into a range in the second file. Thus far, I have been unsuc

3条回答
  •  一整个雨季
    2020-11-28 13:45

    I believe what you're asking for is a conditional join. They're easy in SQL, and the sqldf package makes it easy to query data frames in R using SQL.

    Just pick a version depending on how you want unmatched SNPs handled.

    Inner join version:

    > sqldf("select * from file1test f1 inner join file2 f2 
    +       on (f1.BP > f2.BP_start and f1.BP<= f2.BP_end) ")
    

    Output:

         SNP     BP  Gene BP_start BP_end
    1 rs2343 860269 E3543   860260 879955
    2  rs754 861822 E3543   860260 879955
    3  rs754 861822   E11   861322 879533
    4  rs854 367934  E613   367640 368634
    > 
    

    Left Join version:

    > sqldf("select * from file1test f1 left join file2 f2 
    +       on (f1.BP > f2.BP_start and f1.BP<= f2.BP_end) ")
    

    Output:

         SNP     BP  Gene BP_start BP_end
    1 rs2343 860269 E3543   860260 879955
    2  rs211 369640         NA     NA
    3  rs754 861822 E3543   860260 879955
    4  rs754 861822   E11   861322 879533
    5  rs854 367934  E613   367640 368634
    6  rs343 706940         NA     NA
    7  rs626 717244         NA     NA
    > 
    

    Note that you may want to be careful where you place the = if it matters which group a BP will fall in for the case where a BP exactly matches a BP_start or BP_end.

提交回复
热议问题