Python memory mapping

后端 未结 1 538
猫巷女王i
猫巷女王i 2021-01-02 05:11

I am working with big data and i have matrices with size like 2000x100000, so in order to to work faster i tried using the numpy.memmap to avoid storing in memory this large

相关标签:
1条回答
  • 2021-01-02 05:19

    The NPY format is not simply a dump of the array's data to a file. It includes a header that contains, among other things, the metadata that defines the array's data type and shape. When you use memmap directly like you have done, your memory map doesn't take into account the file's header where the metadata is stored. To create a memory mapped view of a NPY file, you can use the mmap_mode option of np.load.

    Here's an example. First, create a NPY file:

    In [1]: a = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    
    In [2]: np.save('a.npy', a)
    

    Read it back in with np.load:

    In [3]: a1 = np.load('a.npy')
    
    In [4]: a1
    Out[4]: 
    array([[ 1.,  2.,  3.],
           [ 4.,  5.,  6.]])
    

    Incorrectly view the file with memmap:

    In [5]: a2 = np.memmap('a.npy', dtype=np.float64, mode='r', shape=(2, 3))
    
    In [6]: a2
    Out[6]: 
    memmap([[  1.87585069e-309,   1.17119999e+171,   5.22741680e-037],
           [  8.44740097e+252,   2.65141232e+180,   9.92152605e+247]])
    

    Create a memmap using np.load with the option mmap_mode='r':

    In [7]: a3 = np.load('a.npy', mmap_mode='r')
    
    In [8]: a3
    Out[8]: 
    memmap([[ 1.,  2.,  3.],
           [ 4.,  5.,  6.]])
    
    0 讨论(0)
提交回复
热议问题