I\'m importing xlsx
2007 tables into R 3.2.1patched
using package readxl 0.1.0
under Windows 7 64
. The tables\' size is
It depends on whether your data is sparse in different places in different columns, and how sparse it is. I found that having more rows didn't improve the parsing: the majority were still blank, and interpreted as text, even if later on they become dates, etc..
One work-around is to generate the first data row of your excel table to include representative data for every column, and use that to guess column types. I don't like this because I want to leave the original data intact.
Another workaround, if you have complete rows somewhere in the spreadsheet, is to use nskip
instead of n
. This gives the starting point for the column guessing. Say data row 117 has a full set of data:
readxl:::xlsx_col_types(path = "a.xlsx", nskip = 116, n = 1)
Note that you can call the function directly, without having to edit the function in the namespace.
You can then use the vector of spreadsheet types to call read_excel:
col_types <- readxl:::xlsx_col_types(path = "a.xlsx", nskip = 116, n = 1)
dat <- readxl::read_excel(path = "a.xlsx", col_types = col_types)
Then you can manually update any columns which it still gets wrong.