Efficient serialization of numpy boolean arrays

前端未结

关注

 3  1804

一整个雨季 2020-12-10 15:07

I have hundreds of thousands of NumPy boolean arrays that I would like to use as keys to a dictionary. (The values of this dictionary are the number of times we\'ve observed

3条回答

旧时难觅i (楼主)

2020-12-10 15:33
I would convert the array to an bitfield using np.packbits. This is fairly memory efficient, it uses all the bits of a byte. Still the code is relatively simple.
```
import numpy as np
array=np.array([True,False]*20)
Hash=np.packbits(array).tostring()
dict={}
dict[Hash]=10
print(np.unpackbits(np.fromstring(Hash,np.uint8)).astype(np.bool)[:len((array)])
```
Be careful with variable length bool arrays the code does not distinguish between an all False array of for example 6 or 7 members. For moredimensional arrays you will need some reshaping..

If this is still not efficient enough, and your arrays are large, you might be able to reduce the memory further by packing:
```
import bz2
Hash_compressed=bz2.compress(Hash,1)
```
It does not work for random, uncompressible data though
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...