Selecting between shelve and sqlite for really large dictionary (Python)

冷暖自知 提交于 2019-12-03 16:56:14

问题


I have a large Python dictionary of vectors (150k vectors, 10k dimensions each) of float numbers that can't be loaded into memory, so I have to use one of the two methods for storing this on disk and retrieving specific vectors when appropriate. The vectors will be created and stored once, but might be read many (thousands of) times -- so it is really important to have efficient reading. After some tests with shelve module, I tend to believe that sqlite will be a better option for this kind of task, but before I start writing code I would like to hear some more opinions on this... For example, are there any other options except of those two that I'm not aware of?

Now, assuming we agree that the best option is sqlite, another question relates to the exact form of the table. I'm thinking of using a fine-grained structure with rows of the form vector_key, element_no, value to help efficient pagination, instead of storing all 10k elements of a vector into the same record. I would really appreciate any suggestions on this issue.


回答1:


You want sqlite3, then if you use an ORM like sqlalchemy then you can easily grow to expand and use other back end databases.

Shelve is more of a "toy" than actually useful in production code.

The other point you are talking about is called normalization and I have personally never been very good at it this should explain it for you.

Just as an extra note this shows performance failures in shelve vs sqlite3




回答2:


As you are dealing with numeric vectors, you may find PyTables an interesting alternative.



来源:https://stackoverflow.com/questions/10896395/selecting-between-shelve-and-sqlite-for-really-large-dictionary-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!