Persistent multiprocess shared cache in Python with stdlib or minimal dependencies

前端 未结 3 579
有刺的猬
有刺的猬 2021-02-06 10:00

I just tried a Python shelve module as the persistent cache for data fetched from the external service. The complete example is here.

I was wondering what would the best

3条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-06 10:32

    Let's consider your requirements systematically:

    minimum or no external dependencies

    Your use case will determine if you can use in-band (file descriptor or memory region inherited across fork) or out-of-band synchronisation (posix file locks, sys V shared memory).

    Then you may have other requirements, e.g. cross-platform availability of the tools, etc.

    There really isn't that much in the standard library, except bare tools. One module however, stands out, sqlite3. Sqlite uses fcntl/posix locks, there are performance limitations though, multiple processes imply file-backed database, and sqlite requires fdatasync on commit.

    Thus there's a limit to transactions/s in sqlite imposed by your hard drive rpm. The latter is not a big deal if you have hw raid, but can be a major handicap on commodity hardware, e.g. a laptop or usb flash or sd card. Plan for ~100tps if you use a regular, rotating hard drive.

    Your processes can also block on sqlite, if you use special transaction modes.

    preventing thundering herd

    There are two major approaches for this:

    • probabilistically refresh cache item earlier than required, or
    • refresh only when required but block other callers

    Presumably if you trust another process with the cache value, you don't have any security considerations. Thus either will work, or perhaps a combination of both.

提交回复
热议问题