The data.table package provides many of the same table handling methods as SQL. If a table has a key, that key consists of one or more columns. But a table can\'t have more
Good question. Note the following (admittedly buried) in ?data.table :
When
iis adata.table,xmust have a key.iis joined toxusing the key and the rows inxthat match are returned. An equi-join is performed between each column inito each column inx's key. The match is a binary search in compiled C in O(log n) time. Ifihas less columns thanx's key then many rows ofxmay match to each row ofi. Ifihas more columns thanx's key, the columns ofinot involved in the join are included in the result. Ifialso has a key, it isi's key columns that are used to match tox's key columns and a binary merge of the two tables is carried out.
So, the key here is that i doesn't have to be keyed. Only x must be keyed.
X2 <- data.table(id = 11:15, y_id = c(14,14,11,12,12), key="id")
id y_id
[1,] 11 14
[2,] 12 14
[3,] 13 11
[4,] 14 12
[5,] 15 12
Y2 <- data.table(id = 11:15, b = letters[1:5], key="id")
id b
[1,] 11 a
[2,] 12 b
[3,] 13 c
[4,] 14 d
[5,] 15 e
Y2[J(X2$y_id)] # binary search for each item of (unsorted and unkeyed) i
id b
[1,] 14 d
[2,] 14 d
[3,] 11 a
[4,] 12 b
[5,] 12 b
or,
Y2[SJ(X2$y_id)] # binary merge of keyed i, see ?SJ
id b
[1,] 11 a
[2,] 12 b
[3,] 12 b
[4,] 14 d
[5,] 14 d
identical(Y2[J(X2$y_id)], Y2[X2$y_id])
[1] FALSE