Key: value store in Python for possibly 100 GB of data, without client/server

前端未结

关注

 6  648

猫巷女王i 2020-12-25 14:26

There are many solutions to serialize a small dictionary: json.loads/json.dumps, pickle, shelve, ujson, or e

6条回答

盖世英雄少女心 (楼主)

2020-12-25 15:06

LMDB (Lightning Memory-Mapped Database) is a very fast key-value store which has Python bindings and can handle huge database files easily.

There is also the lmdbm wrapper which offers the Pythonic d[key] = value syntax.

By default it only supports byte values, but it can easily be extended to use a serializer (json, msgpack, pickle) for other kinds of values.

import json
from lmdbm import Lmdb

class JsonLmdb(Lmdb):
  def _pre_key(self, value):
    return value.encode("utf-8")
  def _post_key(self, value):
    return value.decode("utf-8")
  def _pre_value(self, value):
    return json.dumps(value).encode("utf-8")
  def _post_value(self, value):
    return json.loads(value.decode("utf-8"))

with JsonLmdb.open("test.db", "c") as db:
  db["key"] = {"some": "object"}
  obj = db["key"]
  print(obj["some"])  # prints "object"

Some benchmarks. Batched inserts (1000 items each) were used for lmdbm and sqlitedict. Write performance suffers a lot for non-batched inserts for these because each insert opens a new transaction by default. dbm refers to stdlib dbm.dumb. Tested on Win 7, Python 3.8, SSD.

continuous writes in seconds

| items | lmdbm | pysos |sqlitedict|   dbm   |
|------:|------:|------:|---------:|--------:|
|     10| 0.0000| 0.0000|   0.01600|  0.01600|
|    100| 0.0000| 0.0000|   0.01600|  0.09300|
|   1000| 0.0320| 0.0460|   0.21900|  0.84200|
|  10000| 0.1560| 2.6210|   2.09100|  8.42400|
| 100000| 1.5130| 4.9140|  20.71700| 86.86200|
|1000000|18.1430|48.0950| 208.88600|878.16000|

random reads in seconds

| items | lmdbm | pysos |sqlitedict|  dbm   |
|------:|------:|------:|---------:|-------:|
|     10| 0.0000|  0.000|    0.0000|  0.0000|
|    100| 0.0000|  0.000|    0.0630|  0.0150|
|   1000| 0.0150|  0.016|    0.4990|  0.1720|
|  10000| 0.1720|  0.250|    4.2430|  1.7470|
| 100000| 1.7470|  3.588|   49.3120| 18.4240|
|1000000|17.8150| 38.454|  516.3170|196.8730|

For the benchmark script see https://github.com/Dobatymo/lmdb-python-dbm/blob/master/benchmark.py

0 讨论(0)

查看其它6个回答