Python: Fast and efficient way of writing large text file

≯℡__Kan透↙ 提交于 2019-11-28 13:46:34

Seems like Pandas might be a good tool for this problem. It's pretty easy to get started with pandas, and it deals well with most ways you might need to get data into python. Pandas deals well with mixed data (floats, ints, strings), and usually can detect the types on its own.

Once you have an (R-like) data frame in pandas, it's pretty straightforward to output the frame to csv.

DataFrame.to_csv(path_or_buf, sep='\t')

There's a bunch of other configuration things you can do to make your tab separated file just right.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

Claris

Unless you are running into a performance issue, you can probably write to the file line by line. Python internally uses buffering and will likely give you a nice compromise between performance and memory efficiency.

Python buffering is different from OS buffering and you can specify how you want things buffered by setting the buffering argument to open.

I think what you might want to do is create a memory mapped file. Take a look at the following documentation to see how you can do this with numpy:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!