I want to map a big fortran record (12G) on hard disk to a numpy array. (Mapping instead of loading for saving memory.)
The data stored in fortran record is not cont
I posted another answer because for the example given here numpy.memmap
worked:
offset = 0
data1 = np.memmap('tmp', dtype='i', mode='r+', order='F',
offset=0, shape=(size1))
offset += size1*byte_size
data2 = np.memmap('tmp', dtype='i', mode='r+', order='F',
offset=offset, shape=(size2))
offset += size1*byte_size
data3 = np.memmap('tmp', dtype='i', mode='r+', order='F',
offset=offset, shape=(size3))
for int32
byte_size=32/8
, for int16
byte_size=16/8
and so forth...
If the sizes are constant, you can load the data in a 2D array like:
shape = (total_length/size,size)
data = np.memmap('tmp', dtype='i', mode='r+', order='F', shape=shape)
You can change the memmap
object as long as you want. It is even possible to make arrays sharing the same elements. In that case the changes made in one are automatically updated in the other.
Other references:
Working with big data in python and numpy, not enough ram, how to save partial results on disc?
numpy.memmap documentation here.