I have a script that generates two-dimensional numpy arrays with dtype=float and shape on the order of (1e3, 1e6). Right
For really big arrays, I've heard about several solutions, and they mostly on being lazy on the I/O :
ndarray (Any class accepting ndarray accepts memmap )Use Python bindings for HDF5, a bigdata-ready file format, like PyTables or h5py
Python's pickling system (out of the race, mentioned for Pythonicity rather than speed)
From the docs of NumPy.memmap :
Create a memory-map to an array stored in a binary file on disk.
Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory
The memmap object can be used anywhere an ndarray is accepted. Given any memmap
fp,isinstance(fp, numpy.ndarray)returns True.
From the h5py doc
Lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want.
The format supports compression of data in various ways (more bits loaded for same I/O read), but this means that the data becomes less easy to query individually, but in your case (purely loading / dumping arrays) it might be efficient