Merge multiple data tables with duplicate column names

后端 未结 6 1419
感情败类
感情败类 2020-12-29 10:19

I am trying to merge (join) multiple data tables (obtained with fread from 5 csv files) to form a single data table. I get an error when I try to merge 5 data tables, but wo

6条回答
  •  醉酒成梦
    2020-12-29 11:00

    If it's just those 5 datatables (where x is the same for all datatables), you could also use nested joins:

    # set the key for each datatable to 'x'
    setkey(DT1,x)
    setkey(DT2,x)
    setkey(DT3,x)
    setkey(DT4,x)
    setkey(DT5,x)
    
    # the nested join
    mergedDT1 <- DT1[DT2[DT3[DT4[DT5]]]]
    

    Or as @Frank said in the comments:

    DTlist <- list(DT1,DT2,DT3,DT4,DT5)
    Reduce(function(X,Y) X[Y], DTlist)
    

    which gives:

       x y1 y2 y3 y4 y5
    1: a 10 11 12 13 14
    2: b 11 12 13 14 15
    3: c 12 13 14 15 16
    4: d 13 14 15 16 17
    5: e 14 15 16 17 18
    6: f 15 16 17 18 19
    

    This gives the same result as:

    mergedDT2 <- Reduce(function(...) merge(..., all = TRUE, by = "x"), list(DT1, DT2, DT3, DT4, DT5))
    
    > identical(mergedDT1,mergedDT2)
    [1] TRUE
    

    When your x columns do not have the same values, a nested join will not give the desired solution:

    DT1[DT2[DT3[DT4[DT5[DT6]]]]]
    

    this gives:

       x y1 y2 y3 y4 y5 y6
    1: b 11 12 13 14 15 15
    2: c 12 13 14 15 16 16
    3: d 13 14 15 16 17 17
    4: e 14 15 16 17 18 18
    5: f 15 16 17 18 19 19
    6: g NA NA NA NA NA 20
    

    While:

    Reduce(function(...) merge(..., all = TRUE, by = "x"), list(DT1, DT2, DT3, DT4, DT5, DT6))
    

    gives:

       x y1 y2 y3 y4 y5 y6
    1: a 10 11 12 13 14 NA
    2: b 11 12 13 14 15 15
    3: c 12 13 14 15 16 16
    4: d 13 14 15 16 17 17
    5: e 14 15 16 17 18 18
    6: f 15 16 17 18 19 19
    7: g NA NA NA NA NA 20
    

    Used data:

    In order to make the code with Reduce work, I changed the names of the y columns.

    DT1 <- data.table(x = letters[1:6], y1 = 10:15)
    DT2 <- data.table(x = letters[1:6], y2 = 11:16)
    DT3 <- data.table(x = letters[1:6], y3 = 12:17)
    DT4 <- data.table(x = letters[1:6], y4 = 13:18)
    DT5 <- data.table(x = letters[1:6], y5 = 14:19)
    
    DT6 <- data.table(x = letters[2:7], y6 = 15:20, key="x")
    

提交回复
热议问题