Does the np.nan in numpy array occupy memory?

前端未结

关注

 3  1430

忘了有多久 2021-01-21 00:57

I have a huge file of csv which can not be loaded into memory. Transforming it to libsvm format may save some memory. There are many nan in csv file. If I read lines and store

3条回答

盖世英雄少女心 (楼主)

2021-01-21 01:25
According to the getsizeof() command from the sys module it does. A simple and fast example :
```
import sys
import numpy as np 

x = np.array([1,2,3])
y = np.array([1,np.nan,3])

x_size = sys.getsizeof(x)
y_size = sys.getsizeof(y)
print(x_size)
print(y_size)
print(y_size == x_size) 
```
This should print out
```
 120
 120 
 True 
```
so my conclusion was it uses as much memory as a normal entry.

Instead you could use sparse matrices (Scipy.sparse) which do not save zero / Null at all and therefore are more memory efficient. But Scipy strongly discourages from using Numpy methods directly https://docs.scipy.org/doc/scipy/reference/sparse.html since Numpy might not interpret them correctly.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...