I have a huge file of csv which can not be loaded into memory. Transforming it to libsvm format may save some memory. There are many nan in csv file. If I read lines and store
According to the getsizeof() command from the sys module it does. A simple and fast example :
import sys
import numpy as np
x = np.array([1,2,3])
y = np.array([1,np.nan,3])
x_size = sys.getsizeof(x)
y_size = sys.getsizeof(y)
print(x_size)
print(y_size)
print(y_size == x_size)
This should print out
120
120
True
so my conclusion was it uses as much memory as a normal entry.
Instead you could use sparse matrices (Scipy.sparse) which do not save zero / Null at all and therefore are more memory efficient. But Scipy strongly discourages from using Numpy methods directly https://docs.scipy.org/doc/scipy/reference/sparse.html since Numpy might not interpret them correctly.