MemoryError using json.dumps()

前端 未结 2 979
醉酒成梦
醉酒成梦 2020-12-18 07:21

I would like to know which one of json.dump() or json.dumps() are the most efficient when it comes to encoding a large array to json format.

<
相关标签:
2条回答
  • 2020-12-18 07:35

    You can simply replace

    f.write(json.dumps(mytab,default=dthandler,indent=4))
    

    by

    json.dump(mytab, f, default=dthandler, indent=4)
    

    This should "stream" the data into the file.

    0 讨论(0)
  • 2020-12-18 07:39

    The JSON module will allocate the entire JSON string in memory before writing, which is why MemoryError occurs.

    To get around this problem, use JSON.Encoder().iterencode():

    with open(filepath, 'w') as f:
        for chunk in json.JSONEncoder().iterencode(object_to_encode):
            f.write(chunk)
    

    However note that this will generally take quite a while, since it is writing in many small chunks and not everything at once.


    Special case:

    I had a Python object which is a list of dicts. Like such:

    [
        { "prop": 1, "attr": 2 },
        { "prop": 3, "attr": 4 }
        # ...
    ]
    

    I could JSON.dumps() individual objects, but the dumping whole list generates a MemoryError To speed up writing, I opened the file and wrote the JSON delimiter manually:

    with open(filepath, 'w') as f:
        f.write('[')
    
        for obj in list_of_dicts[:-1]:
            json.dump(obj, f)
            f.write(',')
    
        json.dump(list_of_dicts[-1], f)
        f.write(']')
    

    You can probably get away with something like that if you know your JSON object structure beforehand. For a general use, just use JSON.Encoder().iterencode().

    0 讨论(0)
提交回复
热议问题