Efficient serialization of numpy boolean arrays

前端 未结 3 1804
一整个雨季
一整个雨季 2020-12-10 15:07

I have hundreds of thousands of NumPy boolean arrays that I would like to use as keys to a dictionary. (The values of this dictionary are the number of times we\'ve observed

3条回答
  •  旧时难觅i
    2020-12-10 15:33

    I would convert the array to an bitfield using np.packbits. This is fairly memory efficient, it uses all the bits of a byte. Still the code is relatively simple.

    import numpy as np
    array=np.array([True,False]*20)
    Hash=np.packbits(array).tostring()
    dict={}
    dict[Hash]=10
    print(np.unpackbits(np.fromstring(Hash,np.uint8)).astype(np.bool)[:len((array)])
    

    Be careful with variable length bool arrays the code does not distinguish between an all False array of for example 6 or 7 members. For moredimensional arrays you will need some reshaping..

    If this is still not efficient enough, and your arrays are large, you might be able to reduce the memory further by packing:

    import bz2
    Hash_compressed=bz2.compress(Hash,1)
    

    It does not work for random, uncompressible data though

提交回复
热议问题