Compressing A Series of JSON Objects While Maintaining Serial Reading?

前端 未结 2 1003
梦谈多话
梦谈多话 2020-12-30 15:53

I have a bunch of json objects that I need to compress as it\'s eating too much disk space, approximately 20 gigs worth for a few million of them.

2条回答
  •  不思量自难忘°
    2020-12-30 16:40

    Just use a gzip.GzipFile() object and treat it like a regular file; write JSON objects line by line, and read them line by line.

    The object takes care of compression transparently, and will buffer reads, decompressing chucks as needed.

    import gzip
    import json
    
    # writing
    with gzip.GzipFile(jsonfilename, 'w') as outfile:
        for obj in objects:
            outfile.write(json.dumps(obj) + '\n')
    
    # reading
    with gzip.GzipFile(jsonfilename, 'r') as infile:
        for line in infile:
            obj = json.loads(line)
            # process obj
    

    This has the added advantage that the compression algorithm can make use of repetition across objects for compression ratios.

提交回复
热议问题