Saving numpy array such that it is readily available without loading

北城以北 提交于 2020-01-25 12:07:10

问题


I have a 20GB library of images stored as a high-dimensional numpy array. This library allows me to these use images without having to generate them anew each time. Now my problem is that np.load("mylibrary") takes as much time as it would take to generate a couple of those images. Therefore my question is: Is there a way to store a numpy array such that it is readily accessible without having to load it?

Edit: I am using PyCharm


回答1:


I would suggest h5py which is a Pythonic interface to the HDF5 binary data format.

It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want.

You can also use PyTables'. It is another HDF5 interface for python and numpy

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. You can download PyTables and use it for free. You can access documentation, some examples of use and presentations here.

numpy.memap is another option. It however would be slower than hdf5. Another condition is that a array should be limited to 2.5G



来源:https://stackoverflow.com/questions/36291562/saving-numpy-array-such-that-it-is-readily-available-without-loading

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!