Merge dataframes on matching A, B and *closest* C?

前端 未结 3 1244
走了就别回头了
走了就别回头了 2020-12-05 11:14

I have two dataframes like so:

set.seed(1)
df <- cbind(expand.grid(x=1:3, y=1:5), time=round(runif(15)*30))
to.merge <- data.frame(x=c(2, 2, 2, 3, 2),
         


        
3条回答
  •  盖世英雄少女心
    2020-12-05 11:58

    mnel's answer uses roll = "nearest" in a data.table join but does not limit to +/- 1 as requested by the OP. In addition, MichaelChirico has suggested to use the on parameter.

    This approach uses

    • roll = "nearest",
    • an update by reference, i.e., without copying,
    • setDT() to coerce a data.frame to data.table without copying (introduced 2014-02-27 with v.1.9.2 of data.table),
    • the on parameter which spares to set a key explicitely (introduced 2015-09-19 with v.1.9.6).

    So, the code below

    library(data.table)   # version 1.11.4 used
    setDT(df)[setDT(to.merge), on  = .(x, y, time), roll = "nearest",
              val := replace(val, abs(x.time - i.time) > 1, NA)]
    df
    

    has updated df:

        x y time  val
     1: 1 1    8 
     2: 2 1   11    c
     3: 3 1   17 
     4: 1 2   27 
     5: 2 2    6 
     6: 3 2   27 
     7: 1 3   28 
     8: 2 3   20 
     9: 3 3   19 
    10: 1 4    2 
    11: 2 4    6 
    12: 3 4    5 
    13: 1 5   21 
    14: 2 5   12 
    15: 3 5   23    d
    

    Note that the order of rows has not been changed (in contrast to Chinmay Patil's answer)

    In case df must not be changed, a new data.table can be created by

    result <- setDT(to.merge)[setDT(df), on  = .(x, y, time), roll = "nearest",
                    .(x, y, time, val = replace(val, abs(x.time - i.time) > 1, NA))]
    result
    

    which returns the same result as above.

提交回复
热议问题