Suppose you have data like
fruits <- data.table(FruitID=c(1,2,3), Fruit=c(\"Apple\", \"Banana\", \"Strawberry\"))
colors <- data.table(ColorID=c(1,2,3,
I just committed a new feature in data.table, v1.9.5, with which we can join without setting keys (that is, specify the columns to join by directly, without having to use setkey()
first):
With that, this is simply:
require(data.table) # v1.9.5+
fruits[tastes, on="FruitID"][colors, on="FruitID"] # no setkey required
# FruitID Fruit TasteID Taste ColorID Color
# 1: 1 Apple 1 Sweeet 1 Red
# 2: 1 Apple 2 Sour 1 Red
# 3: 1 Apple 1 Sweeet 2 Yellow
# 4: 1 Apple 2 Sour 2 Yellow
# 5: 1 Apple 1 Sweeet 3 Green
# 6: 1 Apple 2 Sour 3 Green
# 7: 2 NA NA NA 4 Yellow
# 8: 3 Strawberry 3 Sweet 5 Red
You could use base R's Reduce
to left_join
(from dplyr
) the list of data.table
objects at once given that, you are joining the tables with common column names and willing to avoid setting keys
multiple times for data.table
objects
library(data.table) # <= v1.9.4
library(dplyr) # left_join
Reduce(function(...) left_join(...), list(fruits,colors,tastes))
# Source: local data table [8 x 6]
# FruitID Fruit ColorID Color TasteID Taste
#1 1 Apple 1 Red 1 Sweeet
#2 1 Apple 1 Red 2 Sour
#3 1 Apple 2 Yellow 1 Sweeet
#4 1 Apple 2 Yellow 2 Sour
#5 1 Apple 3 Green 1 Sweeet
#6 1 Apple 3 Green 2 Sour
#7 2 Banana 4 Yellow NA NA
#8 3 Strawberry 5 Red 3 Sweet
Another option with pure data.table approach as @Frank mentioned
(Note, this requires the keys to be set to fruitID
for all the data.table
objects)
library(data.table) # <= v1.9.4
Reduce(function(x,y) y[x, allow.cartesian=TRUE], list(fruits,colors,tastes))