Key: value store in Python for possibly 100 GB of data, without client/server

前端 未结 6 648
猫巷女王i
猫巷女王i 2020-12-25 14:26

There are many solutions to serialize a small dictionary: json.loads/json.dumps, pickle, shelve, ujson, or e

6条回答
  •  盖世英雄少女心
    2020-12-25 15:06

    LMDB (Lightning Memory-Mapped Database) is a very fast key-value store which has Python bindings and can handle huge database files easily.

    There is also the lmdbm wrapper which offers the Pythonic d[key] = value syntax.

    By default it only supports byte values, but it can easily be extended to use a serializer (json, msgpack, pickle) for other kinds of values.

    import json
    from lmdbm import Lmdb
    
    class JsonLmdb(Lmdb):
      def _pre_key(self, value):
        return value.encode("utf-8")
      def _post_key(self, value):
        return value.decode("utf-8")
      def _pre_value(self, value):
        return json.dumps(value).encode("utf-8")
      def _post_value(self, value):
        return json.loads(value.decode("utf-8"))
    
    with JsonLmdb.open("test.db", "c") as db:
      db["key"] = {"some": "object"}
      obj = db["key"]
      print(obj["some"])  # prints "object"
    

    Some benchmarks. Batched inserts (1000 items each) were used for lmdbm and sqlitedict. Write performance suffers a lot for non-batched inserts for these because each insert opens a new transaction by default. dbm refers to stdlib dbm.dumb. Tested on Win 7, Python 3.8, SSD.

    continuous writes in seconds

    | items | lmdbm | pysos |sqlitedict|   dbm   |
    |------:|------:|------:|---------:|--------:|
    |     10| 0.0000| 0.0000|   0.01600|  0.01600|
    |    100| 0.0000| 0.0000|   0.01600|  0.09300|
    |   1000| 0.0320| 0.0460|   0.21900|  0.84200|
    |  10000| 0.1560| 2.6210|   2.09100|  8.42400|
    | 100000| 1.5130| 4.9140|  20.71700| 86.86200|
    |1000000|18.1430|48.0950| 208.88600|878.16000|
    

    random reads in seconds

    | items | lmdbm | pysos |sqlitedict|  dbm   |
    |------:|------:|------:|---------:|-------:|
    |     10| 0.0000|  0.000|    0.0000|  0.0000|
    |    100| 0.0000|  0.000|    0.0630|  0.0150|
    |   1000| 0.0150|  0.016|    0.4990|  0.1720|
    |  10000| 0.1720|  0.250|    4.2430|  1.7470|
    | 100000| 1.7470|  3.588|   49.3120| 18.4240|
    |1000000|17.8150| 38.454|  516.3170|196.8730|
    

    For the benchmark script see https://github.com/Dobatymo/lmdb-python-dbm/blob/master/benchmark.py

提交回复
热议问题