I\'m doing some queries in Python on a large database to get some stats out of the database. I want these stats to be in-memory so other programs can use them without going
Extremely late to the party, but pyfilesystem2 (with which I am not affiliated) seems to be a perfect fit:
https://pyfilesystem2.readthedocs.io
pip install fs
from fs import open_fs
mem_fs = open_fs(u'mem://')
...
You could possibly use a database like SQLite. It's not strictly speaking in memory, but it is fairly light and would be completely separate from your main database.
It may not seem obvious, but pandas has a lot of relational capabilities. See comparison with SQL
I guess, SQLite3 will be the best option then.
If possible, take a look at memcached. (for key-value pair, lighting fast!)
UPDATE 1:
HSQLDB for SQL Like tables. (no python support)
SQLite3 might work. The Python interface does support the in-memory implementation that the SQLite3 C API offers.
From the spec:
You can also supply the special name :memory: to create a database in RAM.
It's also relatively cheap with transactions, depending on what you are doing. To get going, just:
import sqlite3
conn = sqlite3.connect(':memory:')
You can then proceed like you were using a regular database.
Depending on your data - if you can get by with key/value (strings, hashes, lists, sets, sorted sets, etc) - Redis might be another option to explore (as you mentioned that you wanted to share with other programs).
In-memory databases usually do not support memory paging option (for the whole database or certain tables), i,e, total size of the database should be smaller than the available physical memory or maximum shared memory size.
Depending on your application, data-access pattern, size of database and available system memory for database, you have a few choices:
a. Pickled Python Data in File System
It stores structured Python data structure (such as list of dictionaries/lists/tuples/sets, dictionary of lists/pandas dataframes/numpy series, etc.) in pickled format so that they could be used immediately and convienently upon unpickled. AFAIK, Python does not use file system as backing store for Python objects in memory implicitly but host operating system may swap out Python processes for higher priority processes. This is suitable for static data, having smaller memory size compared to available system memory. These pickled data could be copied to other computers, read by multiple dependent or independent processes in the same computer. The actual database file or memory size has higher overhead than size of the data. It is the fastest way to access the data as the data is in the same memory of the Python process, and without a query parsing step.
b. In-memory Database
It stores dynamic or static data in the memory. Possible in-memory libraries that with Python API binding are Redis, sqlite3, Berkeley Database, rqlite, etc. Different in-memory databases offer different features
c. Memory-map Database/Data Structure
It stores static or dynamic data which could be larger than physical memory of the host operating system. Python developers could use API such as mmap.mmap()
numpy.memmap()
to map certain files into process memory space. The files could be arranged into index and data so that data could be lookup/accessed via index lookup. This is actually the mechanism used by various database libraries. Python developers could implement custom techniques to access/update data efficiency.