Decreasing the size of cPickle objects

前端 未结 3 723
悲哀的现实
悲哀的现实 2020-12-07 15:37

I am running code that creates large objects, containing multiple user-defined classes, which I must then serialize for later use. From what I can tell, only pickling is ver

3条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-07 16:27

    If you must use pickle and no other method of serialization works for you, you can always pipe the pickle through bzip2. The only problem is that bzip2 is a little bit slowish... gzip should be faster, but the file size is almost 2x bigger:

    In [1]: class Test(object):
                def __init__(self):
                    self.x = 3841984789317471348934788731984731749374
                    self.y = 'kdjsaflkjda;sjfkdjsf;klsdjakfjdafjdskfl;adsjfl;dasjf;ljfdlf'
            l = [Test() for i in range(1000000)]
    
    In [2]: import cPickle as pickle          
            with open('test.pickle', 'wb') as f:
                pickle.dump(l, f)
            !ls -lh test.pickle
    -rw-r--r--  1 viktor  staff    88M Aug 27 22:45 test.pickle
    
    In [3]: import bz2
            import cPickle as pickle
            with bz2.BZ2File('test.pbz2', 'w') as f:
                pickle.dump(l, f)
            !ls -lh test.pbz2
    -rw-r--r--  1 viktor  staff   2.3M Aug 27 22:47 test.pbz2
    
    In [4]: import gzip
            import cPickle as pickle
            with gzip.GzipFile('test.pgz', 'w') as f:
                pickle.dump(l, f)
            !ls -lh test.pgz
    -rw-r--r--  1 viktor  staff   4.8M Aug 27 22:51 test.pgz
    

    So we see that the file size of the bzip2 is almost 40x smaller, gzip is 20x smaller. And gzip is pretty close in performance to the raw cPickle, as you can see:

    cPickle : best of 3: 18.9 s per loop
    bzip2   : best of 3: 54.6 s per loop
    gzip    : best of 3: 24.4 s per loop
    

提交回复
热议问题