Most efficient way of exporting large (3.9 mill obs) data.frames to text file? [duplicate]

。_饼干妹妹 提交于 2019-11-29 20:38:16
Richie Cotton

At a guess, your machine is short on RAM, and so R is having to use the swap file, which slows things down. If you are being paid to code, then buying more RAM will probably be cheaper than you writing new code.

That said, there are some possibilities. You can export the file to a database and then use that database's facility for writing to a text file. JD Long's answer to this question tells you how to read in files in this way; it shouldn't be too difficult to reverse the process. Alternatively the bigmemory and ff packages (as mentioned by Davy) could be used for writing such files.

1) If your file is all character strings, then it saves using write.table() much faster if you first change it to a matrix.

2) also write it out in chunks of, say 1000000 rows, but always to the same file, and using the argument append = TRUE.

Update

After extensive work by Matt Dowle parallelizing and adding other efficiency improvements, fread is now as much as 15x faster than write.csv. See linked answer for more.


Now data.table has an fwrite function contributed by Otto Seiskari which seems to be about twice as fast as write.csv in general. See here for some benchmarks.

library(data.table) 
fwrite(DF, "output.csv")

Note that row names are excluded, since the data.table type makes no use of them.

Though I only use it to read very large files (10+ Gb) I believe the ff package has functions for writing extremely large dfs.

Well, as the answer with really large files and R often is, its best to offload this kind of work to a database. SPSS has ODBC connectivity, and the RODBC provides an interface from R to SQL.

I note, that in the process of checking out my information, I have been scooped.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!