'Embedded nul in string' when importing large CSV (8 GB) with fread()

二次信任 提交于 2019-11-29 12:06:20

If you're on linux/mac, try this

file <- "file.csv"
tt <- tempfile()  # or tempfile(tmpdir="/dev/shm")
system(paste0("tr < ", file, " -d '\\000' >", tt))
fread(tt)
Nikita Barsukov

A possible option would be to install bash emulator on your machine from http://win-bash.sourceforge.net/ , and remove null terminated strings using Linux tools, as described, for example, here: Identifying and removing null characters in UNIX or here 'Embedded nul in string' error when importing csv with fread

I think the nonsensical characters happen because the file is compressed. This is what I found when trying to read vcf.gz files. fread does not seem to support reading compressed files. See e.g. https://github.com/Rdatatable/data.table/issues/717

readLines() and read.table() support compressed files, but they are slower.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!