I have two data.tables with many fields.
I want to join the two tables, add some calculated fields and append all other fields from the first, second or both tables
This should precisely answer your need.
It uses very powerful R feature called computing on the language (or meta programming) well described in official R Language Definition manual. This is an exceptional feature of R language and should not be forgotten IMO.
library(data.table)
DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5)
DT2 = data.table(x=c("d", "c", "b"), b=6:8)
jj = as.call(c(
list(as.name(".")),
list(sum = quote(a+b)),
lapply(unique(c(names(DT1), names(DT2))), as.name)
))
print(jj)
#.(sum = a + b, x, a, b)
DT1[DT2, eval(jj), on="x"]
# sum x a b
#1: NA d NA 6
#2: 8 c 1 7
#3: 11 b 3 8
#4: 13 b 5 8
You can keep only the columns in DT2 that you need:
DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5, d=rnorm(5))
DT2 = data.table(x=c("d", "c", "b"), b=6:8, c=letters[3])
DT3 <- DT1[DT2[,.(x,b), on="x"][, sum := a+b]
I'm more certain of my answer to the second part of your question, so I'll answer that first. If you only want to say DT1.* or DT2.*, but want the additional column new = a+b, I would do it this way:
DT1[DT2,new:=a+b,on="x"]
For the first part, where you need DT1.* and DT2.*, the only answer I can think of is:
DT1[DT2, on="x"][,new := a+b]
However, there might be more efficient code to achieve this.