Rbind with new columns and data.table

后端 未结 5 700
借酒劲吻你
借酒劲吻你 2020-12-30 07:38

I need to add many large tables to an existing table, so I use rbind with the excellent package data.table. But some of the later tables have more columns than the original

5条回答
  •  死守一世寂寞
    2020-12-30 07:47

    the basic concept is to add missing columns in both directions: from the running master table to the newTable and back the other way.

    As @menl pointed out in the comments, simply assigning an NA is a problem, because that will make the whole column of class logical.

    One solution is to force all columns of a single type (ie as.numeric(NA)), but that is too restrictive.

    Instead, we need to analyze each new column for its class. We can then use as(NA, cc) _(cc being the class) as the vector that we will assign to a new column. We wrap this in an lapply statement on the RHS and use eval(columnName) on the LHS to assign.

    We can then wrap this in a function and use S3 methods so that we can simply call

    rbindFill(A, B)
    

    Below is the function.

    rbindFill.data.table <- function(master, newTable)  {
    # Append newTable to master
    
        # assign to Master
        #-----------------#
          # identify columns missing
          colMisng     <- setdiff(names(newTable), names(master))
    
          # if there are no columns missing, move on to next part
          if (!identical(colMisng, character(0)))  {
               # identify class of each
                colMisng.cls <- sapply(colMisng, function(x) class(newTable[[x]]))
    
                # assign to each column value of NA with appropriate class 
                master[ , eval(colMisng) := lapply(colMisng.cls, function(cc) as(NA, cc))]
              }
    
        # assign to newTable
        #-----------------#
          # identify columns missing
          colMisng     <- setdiff(names(master), names(newTable))
    
          # if there are no columns missing, move on to next part
          if (!identical(colMisng, character(0)))  {
            # identify class of each
            colMisng.cls <- sapply(colMisng, function(x) class(master[[x]]))
    
            # assign to each column value of NA with appropriate class 
            newTable[ , eval(colMisng) := lapply(colMisng.cls, function(cc) as(NA, cc))]
          }
    
        # reorder columns to avoid warning about ordering
        #-----------------#
          colOrdering <- colOrderingByOtherCol(newTable, names(master))
          setcolorder(newTable,  colOrdering)
    
        # rbind them! 
        #-----------------#
          rbind(master, newTable)
      }
    
      # implement generic function
      rbindFill <- function(x, y, ...) UseMethod("rbindFill")
    


    Example Usage:

        # Sample Data: 
        #--------------------------------------------------#
        A  <- data.table(a=1:3, b=1:3, c=1:3)
        A2 <- data.table(a=6:9, b=6:9, c=6:9)
        B  <- data.table(b=1:3, c=1:3, d=1:3, m=LETTERS[1:3])
        C  <- data.table(n=round(rnorm(3), 2), f=c(T, F, T), c=7:9)
        #--------------------------------------------------#
    
        # Four iterations of calling rbindFill
        master <- rbindFill(A, B)
        master <- rbindFill(master, A2)
        master <- rbindFill(master, C)
    
        # Results:
        master
        #      a  b c  d  m     n     f
        #  1:  1  1 1 NA NA    NA    NA
        #  2:  2  2 2 NA NA    NA    NA
        #  3:  3  3 3 NA NA    NA    NA
        #  4: NA  1 1  1  A    NA    NA
        #  5: NA  2 2  2  B    NA    NA
        #  6: NA  3 3  3  C    NA    NA
        #  7:  6  6 6 NA NA    NA    NA
        #  8:  7  7 7 NA NA    NA    NA
        #  9:  8  8 8 NA NA    NA    NA
        # 10:  9  9 9 NA NA    NA    NA
        # 11: NA NA 7 NA NA  0.86  TRUE
        # 12: NA NA 8 NA NA -1.15 FALSE
        # 13: NA NA 9 NA NA  1.10  TRUE
    

提交回复
热议问题