Preventing column-class inference in fread()

后端 未结 2 1128
挽巷
挽巷 2020-12-10 12:53

Is there a way for fread to mimic the behaviour of read.table whereby the class of the variable is set by the data that is read in.

2条回答
  •  眼角桃花
    2020-12-10 13:38

    Option 1: Using a system command

    fread() allows the use of a system command in its first argument. We can use it to remove the quotes in the first column of the file.

    indt <- data.table::fread("cat test.csv | tr -d '\"'", nrows = 100)
    str(indt)
    # Classes ‘data.table’ and 'data.frame':    100 obs. of  2 variables:
    #  $ x: int  1 2 3 4 5 6 7 8 9 10 ...
    #  $ y: int  1 2 3 4 5 6 7 8 9 10 ...
    #  - attr(*, ".internal.selfref")= 
    

    The system command cat test.csv | tr -d '\"' explained:

    • cat test.csv reads the file to standard output
    • | is a pipe, using the output of the previous command as input for the next command
    • tr -d '\"' deletes (-d) all occurrences of double quotes ('\"') from the current input

    Option 2: Coercion after reading

    Since option 1 doesn't seem to be working on your system, another possibility is to read the file as you did, but convert the x column with type.convert().

    library(data.table)
    indt2 <- fread("test.csv", nrows = 100)[, x := type.convert(x)]
    str(indt2)
    # Classes ‘data.table’ and 'data.frame':    100 obs. of  2 variables:
    #  $ x: int  1 2 3 4 5 6 7 8 9 10 ...
    #  $ y: int  1 2 3 4 5 6 7 8 9 10 ...
    #  - attr(*, ".internal.selfref")= 
    

    Side note: I usually prefer to use type.convert() over as.numeric() to avoid the "NAs introduced by coercion" warning triggered in some cases. For example,

    x <- c("1", "4", "NA", "6")
    as.numeric(x)
    # [1]  1  4 NA  6
    # Warning message:
    # NAs introduced by coercion 
    type.convert(x)
    # [1]  1  4 NA  6
    

    But of course you can use as.numeric() as well.


    Note: This answer assumes data.table dev v1.9.5

提交回复
热议问题