发表新帖

发表新帖

Specifying Column Types when Importing xlsx Data to R with Package readxl

后端未结

关注

 6  1510

梦如初夏 2020-12-13 07:40

I\'m importing xlsx 2007 tables into R 3.2.1patched using package readxl 0.1.0 under Windows 7 64. The tables\' size is

6条回答

不知归路 (楼主)

2020-12-13 07:54
Reviewing the source, we see that there is an Rcpp call that returns the guessed column types:
```
xlsx_col_types <- function(path, sheet = 0L, na = "", nskip = 0L, n = 100L) {
    .Call('readxl_xlsx_col_types', PACKAGE = 'readxl', path, sheet, na, nskip, n)
}
```
You can see that by default, nskip = 0L, n = 100L checks the first 100 rows to guess column type. You can change nskip to ignore the header text and increase n(at the cost of a much slower runtime) by doing:
```
col_types <-  .Call( 'readxl_xlsx_col_types', PACKAGE = 'readxl', 
                     path = file_loc, sheet = 0L, na = "", 
                     nskip = 1L, n = 10000L )

# if a column type is "blank", no values yet encountered -- increase n or just guess "text"
col_types[col_types=="blank"] <- "text"

raw <- read_excel(path = file_loc, col_types = col_types)
```
Without looking at the .Rcpp, it's not immediately clear to me whether nskip = 0L skips the header row (the zeroth row in c++ counting) or skips no rows. I avoided the ambiguity by just using nskip = 1L, since skipping a row of my dataset doesn't impact the overall column type guesses.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题