I would like to know which one of json.dump()
or json.dumps()
are the most efficient when it comes to encoding a large array to json format.
You can simply replace
f.write(json.dumps(mytab,default=dthandler,indent=4))
by
json.dump(mytab, f, default=dthandler, indent=4)
This should "stream" the data into the file.
The JSON
module will allocate the entire JSON string in memory before writing, which is why MemoryError
occurs.
To get around this problem, use JSON.Encoder().iterencode():
with open(filepath, 'w') as f:
for chunk in json.JSONEncoder().iterencode(object_to_encode):
f.write(chunk)
However note that this will generally take quite a while, since it is writing in many small chunks and not everything at once.
Special case:
I had a Python object which is a list of dicts. Like such:
[
{ "prop": 1, "attr": 2 },
{ "prop": 3, "attr": 4 }
# ...
]
I could JSON.dumps()
individual objects, but the dumping whole list generates a MemoryError
To speed up writing, I opened the file and wrote the JSON delimiter manually:
with open(filepath, 'w') as f:
f.write('[')
for obj in list_of_dicts[:-1]:
json.dump(obj, f)
f.write(',')
json.dump(list_of_dicts[-1], f)
f.write(']')
You can probably get away with something like that if you know your JSON object structure beforehand. For a general use, just use JSON.Encoder().iterencode()
.