Specifying Column Types when Importing xlsx Data to R with Package readxl

后端 未结 6 1510
梦如初夏
梦如初夏 2020-12-13 07:40

I\'m importing xlsx 2007 tables into R 3.2.1patched using package readxl 0.1.0 under Windows 7 64. The tables\' size is

6条回答
  •  不知归路
    2020-12-13 07:54

    Reviewing the source, we see that there is an Rcpp call that returns the guessed column types:

    xlsx_col_types <- function(path, sheet = 0L, na = "", nskip = 0L, n = 100L) {
        .Call('readxl_xlsx_col_types', PACKAGE = 'readxl', path, sheet, na, nskip, n)
    }
    

    You can see that by default, nskip = 0L, n = 100L checks the first 100 rows to guess column type. You can change nskip to ignore the header text and increase n(at the cost of a much slower runtime) by doing:

    col_types <-  .Call( 'readxl_xlsx_col_types', PACKAGE = 'readxl', 
                         path = file_loc, sheet = 0L, na = "", 
                         nskip = 1L, n = 10000L )
    
    # if a column type is "blank", no values yet encountered -- increase n or just guess "text"
    col_types[col_types=="blank"] <- "text"
    
    raw <- read_excel(path = file_loc, col_types = col_types)
    

    Without looking at the .Rcpp, it's not immediately clear to me whether nskip = 0L skips the header row (the zeroth row in c++ counting) or skips no rows. I avoided the ambiguity by just using nskip = 1L, since skipping a row of my dataset doesn't impact the overall column type guesses.

提交回复
热议问题