I would like to know which one of json.dump()
or json.dumps()
are the most efficient when it comes to encoding a large array to json format.
The JSON
module will allocate the entire JSON string in memory before writing, which is why MemoryError
occurs.
To get around this problem, use JSON.Encoder().iterencode():
with open(filepath, 'w') as f:
for chunk in json.JSONEncoder().iterencode(object_to_encode):
f.write(chunk)
However note that this will generally take quite a while, since it is writing in many small chunks and not everything at once.
Special case:
I had a Python object which is a list of dicts. Like such:
[
{ "prop": 1, "attr": 2 },
{ "prop": 3, "attr": 4 }
# ...
]
I could JSON.dumps()
individual objects, but the dumping whole list generates a MemoryError
To speed up writing, I opened the file and wrote the JSON delimiter manually:
with open(filepath, 'w') as f:
f.write('[')
for obj in list_of_dicts[:-1]:
json.dump(obj, f)
f.write(',')
json.dump(list_of_dicts[-1], f)
f.write(']')
You can probably get away with something like that if you know your JSON object structure beforehand. For a general use, just use JSON.Encoder().iterencode()
.