Select a sequence of columns: `:` works but not `seq`

前端 未结 3 425
旧巷少年郎
旧巷少年郎 2020-12-11 17:00

I\'m trying to subset a dataset by selecting some columns from a data.table. However, my code does not work with some variations.

Here is a sample data.table

相关标签:
3条回答
  • 2020-12-11 17:09

    The lesson I learned is to use list instead of c:

     DT[ ,list(ID,Capacity)]
     #---------------------------
         ID Capacity
      1:  1      483
      2:  2      703
      3:  3      924
      4:  4      267
      5:  5      588
     ---            
    196: 46      761
    197: 47      584
    198: 48      402
    199: 49      416
    200: 50      130
    

    It lets you ignore those pesky quotations, and it also moves you in the direction of seeing the j argument as an evaluated expression with an environment of the datatable itself.

    To 'get' the named columns by number use the mget function and the names function. R 'names' are language elements, i.e., data objects in the search path from the current environment. Column names of dataframes are not actually R names. So you need a function that will take a character value and cause the interpreter to consider it a fully qualified name. Datatable-[-function syntax for the j item does handle column names as language objects rather than character values as would the [.data.frame-function:

    DT[ ,mget(names(DT)[c(1,2)])]
         ID Capacity
      1:  1      483
      2:  2      703
      3:  3      924
      4:  4      267
      5:  5      588
     ---            
    196: 46      761
    197: 47      584
    198: 48      402
    199: 49      416
    200: 50      130
    
    0 讨论(0)
  • 2020-12-11 17:10

    The main issue here is that columns in data.table are referenced objects so you cannot use the same syntax as data.frame. ie no quoted names or numbers

    so DT[,c("ID", "Capacity")] won't work for the same reason that DT[,seq(1:2)] won't work.

    However, adding ,with=FALSE causes the data.table to work be referenced as a data.frame would be

    so DT[,c("ID", "Capacity"), with=FALSE] AND DT[,seq(1:2), with=FALSE] now give you what you want.

         ID Capacity
      1:  1      913
      2:  2      602
      3:  3      861
      4:  4      967
      5:  5      374
     ---            
    196: 46      163
    197: 47      254
    198: 48      390
    199: 49      853
    200: 50      486
    

    EDIT: as pointed out by @Rich Scriven

    0 讨论(0)
  • 2020-12-11 17:22

    On recent versions of data.table, numbers can be used in j to specify columns. This behaviour includes formats such as DT[,1:2] to specify a numeric range of columns. (Note that this syntax does not work on older versions of data.table).

    So why does DT[ , 1:2] work, but DT[ , seq(1:2)] does not? The answer is buried in the code for data.table:::[.data.table, which includes the lines:

      if (!missing(j)) {
        jsub = replace_dot_alias(substitute(j))
        root = if (is.call(jsub)) 
          as.character(jsub[[1L]])[1L]
        else ""
        if (root == ":" || (root %chin% c("-", "!") && is.call(jsub[[2L]]) && 
            jsub[[2L]][[1L]] == "(" && is.call(jsub[[2L]][[2L]]) && 
            jsub[[2L]][[2L]][[1L]] == ":") || (!length(all.vars(jsub)) && 
                root %chin% c("", "c", "paste", "paste0", "-", "!") && 
                missing(by))) {
          with = FALSE
        }
    

    We can see here that data.table is automatically setting the with = FALSE parameter for you when it detects the use of function : in j. It doesn't have the same functionality built in for seq, so we have to specify with = FALSE ourselves if we want to use the seq syntax.

    DT[ , seq(1:2), with = FALSE]
    
    0 讨论(0)
提交回复
热议问题