I recently asked a question regarding how to save large python objects to file. I had previously run into problems converting massive Python dictionaries into string and writing
I'd just expand on phihag's answer.
When trying to serialize an object approaching the size of RAM, pickle/cPickle should be avoided, since it requires additional memory of 1-2 times the size of the object in order to serialize. That's true even when streaming it to BZ2File. In my case I was even running out of swap space.
But the problem with JSON (and similarly with HDF files as mentioned in the linked article) is that it cannot serialize tuples, which in my data are used as keys to dicts. There is no great solution for this; the best I could find was to convert tuples to strings, which requires some memory of its own, but much less than pickle. Nowadays, you can also use the ujson library, which is much faster than the json library.
For tuples composed of strings (requires strings to contain no commas):
import ujson as json
from bz2 import BZ2File
bigdata = { ('a','b','c') : 25, ('d','e') : 13 }
bigdata = dict([(','.join(k), v) for k, v in bigdata.viewitems()])
f = BZ2File('filename.json.bz2',mode='wb')
json.dump(bigdata,f)
f.close()
To re-compose the tuples:
bigdata = dict([(tuple(k.split(',')),v) for k,v in bigdata.viewitems()])
Alternatively if e.g. your keys are 2-tuples of integers:
bigdata2 = { (1,2): 1.2, (2,3): 3.4}
bigdata2 = dict([('%d,%d' % k, v) for k, v in bigdata2.viewitems()])
# ... save, load ...
bigdata2 = dict([(tuple(map(int,k.split(','))),v) for k,v in bigdata2.viewitems()])
Another advantage of this approach over pickle is that json appears to compress a significantly better than pickles when using bzip2 compression.