I have to work with a collection of 120 files of ~2 GB (525600 lines x 302 columns). The goal is to make some statistics and put the results in a clean SQLite database.
If peak memory is not an issue or you can stream it in chunks that are manageable, the following gsub()
/fread()
hybrid should work, converting all continuous space characters to a single delimiter of your choosing (e.g. "\t"
), prior to parsing by fread()
:
fread_blank = function(inputFile, spaceReplace = "\t", n = -1, ...){
fread(
input = paste0(
gsub(pattern = "[[:space:]]+",
replacement = spaceReplace,
x = readLines(inputFile, n = n)),
collapse = "\n"),
...)
}
I must agree with others that space-delimited files is not ideal choice, but I come across them pretty often whether I like it or not.