R dplyr join by range or virtual column

后端 未结 6 1331
独厮守ぢ
独厮守ぢ 2020-12-10 16:33

I want to join two tibbles by a range or a virtual column. but it seems the by - parameter just allow to handle chr oder vector(chr) o

6条回答
  •  感情败类
    2020-12-10 17:07

    I don't think inequality joins is implemented in dplyr yet, or it ever will (see this discussion on Join on inequality constraints), but this is a good situation to use an SQL join:

    library(tibble)
    library(sqldf)
    
    as.tibble(sqldf("select d.value, r.class from d
                    join r on d.value >= r.'from' and 
                              d.value < r.'to'"))
    

    Alternatively, if you want to integrate the join into your dplyr chain, you can use fuzzyjoin::fuzzy_join:

    library(dplyr)
    library(fuzzyjoin)
    
    d %>%
      fuzzy_join(r, by = c("value" = "from", "value" = "to"), 
                 match_fun = list(`>=`, `<`)) %>%
      select(value, class)
    

    Result:

    # A tibble: 31 x 2
       value class
        
     1   1.0     A
     2   1.2     A
     3   1.4     A
     4   1.6     A
     5   1.8     A
     6   2.0     A
     7   2.0     B
     8   2.2     B
     9   2.4     B
    10   2.6     B
    # ... with 21 more rows
    

    Notice I added single quotes around from and to since those are reserved words for the SQL language.

提交回复
热议问题