R data.table join: SQL “select *” alike syntax in joined tables?

前端 未结 3 1697
-上瘾入骨i
-上瘾入骨i 2020-12-19 04:24

I have two data.tables with many fields.

I want to join the two tables, add some calculated fields and append all other fields from the first, second or both tables

相关标签:
3条回答
  • 2020-12-19 04:49

    This should precisely answer your need.
    It uses very powerful R feature called computing on the language (or meta programming) well described in official R Language Definition manual. This is an exceptional feature of R language and should not be forgotten IMO.

    library(data.table)
    DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5)
    DT2 = data.table(x=c("d", "c", "b"), b=6:8)
    
    jj = as.call(c(
        list(as.name(".")),
        list(sum = quote(a+b)),
        lapply(unique(c(names(DT1), names(DT2))), as.name)
    ))
    print(jj)
    #.(sum = a + b, x, a, b)
    DT1[DT2, eval(jj), on="x"]
    #   sum x  a b
    #1:  NA d NA 6
    #2:   8 c  1 7
    #3:  11 b  3 8
    #4:  13 b  5 8
    
    0 讨论(0)
  • 2020-12-19 05:07

    You can keep only the columns in DT2 that you need:

    DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5, d=rnorm(5))
    DT2 = data.table(x=c("d", "c", "b"), b=6:8, c=letters[3])
    
    DT3 <- DT1[DT2[,.(x,b), on="x"][, sum := a+b]
    
    0 讨论(0)
  • 2020-12-19 05:09

    I'm more certain of my answer to the second part of your question, so I'll answer that first. If you only want to say DT1.* or DT2.*, but want the additional column new = a+b, I would do it this way:

    DT1[DT2,new:=a+b,on="x"]
    

    For the first part, where you need DT1.* and DT2.*, the only answer I can think of is:

    DT1[DT2, on="x"][,new := a+b]
    

    However, there might be more efficient code to achieve this.

    0 讨论(0)
提交回复
热议问题