There are many solutions to serialize a small dictionary: json.loads/json.dumps, pickle, shelve, ujson, or e
LMDB (Lightning Memory-Mapped Database) is a very fast key-value store which has Python bindings and can handle huge database files easily.
There is also the lmdbm wrapper which offers the Pythonic d[key] = value syntax.
By default it only supports byte values, but it can easily be extended to use a serializer (json, msgpack, pickle) for other kinds of values.
import json
from lmdbm import Lmdb
class JsonLmdb(Lmdb):
def _pre_key(self, value):
return value.encode("utf-8")
def _post_key(self, value):
return value.decode("utf-8")
def _pre_value(self, value):
return json.dumps(value).encode("utf-8")
def _post_value(self, value):
return json.loads(value.decode("utf-8"))
with JsonLmdb.open("test.db", "c") as db:
db["key"] = {"some": "object"}
obj = db["key"]
print(obj["some"]) # prints "object"
Some benchmarks. Batched inserts (1000 items each) were used for lmdbm and sqlitedict. Write performance suffers a lot for non-batched inserts for these because each insert opens a new transaction by default. dbm refers to stdlib dbm.dumb. Tested on Win 7, Python 3.8, SSD.
continuous writes in seconds
| items | lmdbm | pysos |sqlitedict| dbm |
|------:|------:|------:|---------:|--------:|
| 10| 0.0000| 0.0000| 0.01600| 0.01600|
| 100| 0.0000| 0.0000| 0.01600| 0.09300|
| 1000| 0.0320| 0.0460| 0.21900| 0.84200|
| 10000| 0.1560| 2.6210| 2.09100| 8.42400|
| 100000| 1.5130| 4.9140| 20.71700| 86.86200|
|1000000|18.1430|48.0950| 208.88600|878.16000|
random reads in seconds
| items | lmdbm | pysos |sqlitedict| dbm |
|------:|------:|------:|---------:|-------:|
| 10| 0.0000| 0.000| 0.0000| 0.0000|
| 100| 0.0000| 0.000| 0.0630| 0.0150|
| 1000| 0.0150| 0.016| 0.4990| 0.1720|
| 10000| 0.1720| 0.250| 4.2430| 1.7470|
| 100000| 1.7470| 3.588| 49.3120| 18.4240|
|1000000|17.8150| 38.454| 516.3170|196.8730|
For the benchmark script see https://github.com/Dobatymo/lmdb-python-dbm/blob/master/benchmark.py