I have a data.table named dtA:
My actual dtA has 62871932 rows and 3 columns:
date company value
19810
I think I know how to solve this:
in dtB I add a pointer using data.table syntax:
dtB[, pointer := 1]
dtB will looks like this
date company value pointer
198101 A 2 1
198102 B 5 1
Then I use LEFT OUTER JOIN method from here: https://rstudio-pubs-static.s3.amazonaws.com/52230_5ae0d25125b544caab32f75f0360e775.html
setkey(dtA, date, company, value)
setkey(dtB, date, company, value)
dtA=merge(dtA, dtB, all.x)
This means on pointer column, if dtB's row exist in dtA, it will give 1. If dtB's row do not exist in dtA's, then it will be given NA
Result will be:
date company value pointer
198101 A 1 NA
198101 A 2 1
198101 B 5 NA
198102 A 2 NA
198102 B 5 1
198102 B 6 NA
I then select those rows with NA and remove pointer column:
dtA=dtA[!(pointer %in% "1")][,-c("pointer")]
I get my result:
date company value
198101 A 1
198101 B 5
198102 A 2
198102 B 6