Efficient ways to write a large NumPy array to a file

前提是你 提交于 2019-12-05 06:14:00

Unless there is a reason for the intermediate files to be human-readable, do not use CSV, as this will inevitably involve a loss of precision.

The most efficient is probably tofile (doc) which is intended for quick dumps of file to disk when you know all of the attributes of the data ahead of time.

For platform-independent, but numpy-specific, saves, you can use save (doc).

Numpy and scipy also have support for various scientific data formats like HDF5 if you need portability.

I would recommend looking at the pickle module. The pickle module allows you to serialize python objects as streams of bytes (e.g., strings). This allows you to write them to a file or send them over a network, and then reinstantiate the objects later.

reclosedev

Try Joblib - Fast compressed persistence

One of the key components of joblib is it’s ability to persist arbitrary Python objects, and read them back very quickly. It is particularly efficient for containers that do their heavy lifting with numpy arrays. The trick to achieving great speed has been to save in separate files the numpy arrays, and load them via memmapping.

Edit: Newer (2016) blog entry on data persistence in Joblib

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!