I have two data.tables: DT
and meta
. When I merge them using DT[meta]
, memory usage increases by more than 10 GB (and the merge is ver
Maybe others functions can work better, like merge()
or cbind()
.
My bad. The problem was that keys were not unique:
a<-data.table(x=c(1,1),y=c(1,2))
b<-data.table(x=c(1,1),y=c(3,4))
setkey(a,x)
setkey(b,x)
a[b]
x y y.1
[1,] 1 1 3
[2,] 1 2 3
[3,] 1 1 4
[4,] 1 2 4
It would be nice if data.table could give a warning for that.
Update from Matthew
This warning has now been implemented in v1.8.7 :
New argument
allow.cartesian
( defaultFALSE
) added toX[Y]
andmerge(X,Y)
, #2464. Prevents large allocations due to misspecified joins; e.g., duplicate key values inY
joining to the same group inX
over and over again. The word cartesian is used loosely for when more thanmax(nrow(X),nrow(Y))
rows would be returned. The error message is verbose and includes advice.