Data.table - left outer join on multiple tables

后端 未结 2 532
时光取名叫无心
时光取名叫无心 2020-12-10 08:18

Suppose you have data like

fruits <- data.table(FruitID=c(1,2,3), Fruit=c(\"Apple\", \"Banana\", \"Strawberry\"))
colors <- data.table(ColorID=c(1,2,3,         


        
相关标签:
2条回答
  • 2020-12-10 08:28

    I just committed a new feature in data.table, v1.9.5, with which we can join without setting keys (that is, specify the columns to join by directly, without having to use setkey() first):

    With that, this is simply:

    require(data.table) # v1.9.5+
    fruits[tastes, on="FruitID"][colors, on="FruitID"] # no setkey required
    #    FruitID      Fruit TasteID  Taste ColorID  Color
    # 1:       1      Apple       1 Sweeet       1    Red
    # 2:       1      Apple       2   Sour       1    Red
    # 3:       1      Apple       1 Sweeet       2 Yellow
    # 4:       1      Apple       2   Sour       2 Yellow
    # 5:       1      Apple       1 Sweeet       3  Green
    # 6:       1      Apple       2   Sour       3  Green
    # 7:       2         NA      NA     NA       4 Yellow
    # 8:       3 Strawberry       3  Sweet       5    Red
    
    0 讨论(0)
  • 2020-12-10 08:39

    You could use base R's Reduce to left_join (from dplyr) the list of data.table objects at once given that, you are joining the tables with common column names and willing to avoid setting keys multiple times for data.table objects

    library(data.table) # <= v1.9.4
    library(dplyr) # left_join
    
    Reduce(function(...) left_join(...), list(fruits,colors,tastes))
    
    # Source: local data table [8 x 6]
    
    #  FruitID      Fruit ColorID  Color TasteID  Taste
    #1       1      Apple       1    Red       1 Sweeet
    #2       1      Apple       1    Red       2   Sour
    #3       1      Apple       2 Yellow       1 Sweeet
    #4       1      Apple       2 Yellow       2   Sour
    #5       1      Apple       3  Green       1 Sweeet
    #6       1      Apple       3  Green       2   Sour
    #7       2     Banana       4 Yellow      NA     NA
    #8       3 Strawberry       5    Red       3  Sweet
    

    Another option with pure data.table approach as @Frank mentioned (Note, this requires the keys to be set to fruitID for all the data.table objects)

    library(data.table) # <= v1.9.4
    Reduce(function(x,y) y[x, allow.cartesian=TRUE], list(fruits,colors,tastes))
    
    0 讨论(0)
提交回复
热议问题