R: merge based on multiple conditions (with non-equal criteria)

后端 未结 5 809
陌清茗
陌清茗 2021-01-01 05:26

I would like to merge 2 data frames based on multiple conditions.

DF1 <- data.frame(\"col1\" = rep(c(\"A\",\"B\"), 18),
                  \"col2\" = rep(c         


        
5条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-01 05:34

    With the recent versions of data.table, non-equi joins and update on join are possible:

    library(data.table)
    head(setDT(DF1)[setDT(DF2), on = c("col1", "col2", "value>=min", "value<=max"), 
                    data := data])
    
       rn col1 col2 value col4 data
    1:  1    A    C    22   NA    1
    2:  2    B    D    58   NA   NA
    3:  3    A    E    35   NA   NA
    4:  4    B    C    86   NA   NA
    5:  5    A    D    37   NA    3
    6:  6    B    E    16   NA   NA
    

    Data

    DF1 <- structure(list(rn = 1:36, col1 = c("A", "B", "A", "B", "A", "B", 
    "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", 
    "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", 
    "A", "B", "A", "B"), col2 = c("C", "D", "E", "C", "D", "E", "C", 
    "D", "E", "C", "D", "E", "C", "D", "E", "C", "D", "E", "C", "D", 
    "E", "C", "D", "E", "C", "D", "E", "C", "D", "E", "C", "D", "E", 
    "C", "D", "E"), value = c(22L, 58L, 35L, 86L, 37L, 16L, 46L, 
    23L, 88L, 3L, 33L, 25L, 19L, 24L, 9L, 76L, 62L, 68L, 97L, 43L, 
    8L, 84L, 36L, 20L, 57L, 99L, 42L, 64L, 87L, 1L, 78L, 34L, 41L, 
    32L, 10L, 72L), col4 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("rn", 
    "col1", "col2", "value", "col4"), row.names = c(NA, -36L), class = "data.frame")
    DF2 <- structure(list(rn = 1:6, col1 = c("A", "A", "A", "A", "A", "A"
    ), col2 = c("C", "D", "C", "D", "C", "D"), data = c(1L, 3L, 1L, 
    3L, 1L, 3L), min = c(0L, 10L, 20L, 30L, 40L, 50L), max = c(10L, 
    20L, 30L, 40L, 50L, 60L)), .Names = c("rn", "col1", "col2", "data", 
    "min", "max"), row.names = c(NA, -6L), class = "data.frame")
    

提交回复
热议问题