I\'m importing xlsx 2007 tables into R 3.2.1patched using package readxl 0.1.0 under Windows 7 64. The tables\' size is
Reviewing the source, we see that there is an Rcpp call that returns the guessed column types:
xlsx_col_types <- function(path, sheet = 0L, na = "", nskip = 0L, n = 100L) {
.Call('readxl_xlsx_col_types', PACKAGE = 'readxl', path, sheet, na, nskip, n)
}
You can see that by default, nskip = 0L, n = 100L checks the first 100 rows to guess column type. You can change nskip to ignore the header text and increase n(at the cost of a much slower runtime) by doing:
col_types <- .Call( 'readxl_xlsx_col_types', PACKAGE = 'readxl',
path = file_loc, sheet = 0L, na = "",
nskip = 1L, n = 10000L )
# if a column type is "blank", no values yet encountered -- increase n or just guess "text"
col_types[col_types=="blank"] <- "text"
raw <- read_excel(path = file_loc, col_types = col_types)
Without looking at the .Rcpp, it's not immediately clear to me whether nskip = 0L skips the header row (the zeroth row in c++ counting) or skips no rows. I avoided the ambiguity by just using nskip = 1L, since skipping a row of my dataset doesn't impact the overall column type guesses.