Join two data tables and use only one column from second dt

后端 未结 2 1436
情深已故
情深已故 2020-12-06 07:02

Let\'s say I have two data tables (dt1 and dt2), and I want to get dt3 using data tables. A,B,C,E,F,G,H are column names. dt1 key is column A, and dt2 key is column E. Data

相关标签:
2条回答
  • 2020-12-06 07:30

    data.table solution

    setDT(dt1)[ , H := dt2$H[match(dt1$A , dt2$E)] , ]
    
    #    A  B  C  H
    # 1: 1  4  7 16
    # 2: 2  5  8 17
    # 3: 3  6  9 18
    # 4: 2 20 21 17
    

    another dplyr solution will be

    left_join(x = dt1 , y = dt2 , by = c("A" = "E")) %>% 
    select(one_of(c("A" , "B" , "C" , "H")))
    
    0 讨论(0)
  • 2020-12-06 07:36

    In order to perform a left join to df1 and add H column from df2, you can combine binary join with the update by reference operator (:=)

    setkey(setDT(dt1), A) 
    dt1[dt2, H := i.H]
    

    See here and here for detailed explanation on how it works


    With the devel version (v >= 1.9.5) we could make it even shorter by specifying the key within setDT (as pointed by @Arun)

    setDT(dt1, key = "A")[dt2, H := i.H]
    

    Edit 24/7/2015

    You can now run a binary join using the new on parameter without setting keys

    setDT(dt1)[dt2, H := i.H, on = c(A = "E")]
    
    0 讨论(0)
提交回复
热议问题