Does the np.nan in numpy array occupy memory?

前端 未结 3 1430
忘了有多久
忘了有多久 2021-01-21 00:57

I have a huge file of csv which can not be loaded into memory. Transforming it to libsvm format may save some memory. There are many nan in csv file. If I read lines and store

3条回答
  •  盖世英雄少女心
    2021-01-21 01:25

    According to the getsizeof() command from the sys module it does. A simple and fast example :

    import sys
    import numpy as np 
    
    x = np.array([1,2,3])
    y = np.array([1,np.nan,3])
    
    x_size = sys.getsizeof(x)
    y_size = sys.getsizeof(y)
    print(x_size)
    print(y_size)
    print(y_size == x_size) 
    

    This should print out

     120
     120 
     True 
    

    so my conclusion was it uses as much memory as a normal entry.

    Instead you could use sparse matrices (Scipy.sparse) which do not save zero / Null at all and therefore are more memory efficient. But Scipy strongly discourages from using Numpy methods directly https://docs.scipy.org/doc/scipy/reference/sparse.html since Numpy might not interpret them correctly.

提交回复
热议问题