Frequently Updating Stored Data for a Numerical Experiment using Python [closed]

后端未结

关注

 2  1699

情歌与酒

相关标签:

2条回答

被撕碎了的回忆

2021-01-07 13:54
Shelve is probably not a good choice, however...

You might try using klepto or joblib. Both are good at caching results, and can use efficient storage formats.

Both joblib and klepto can save your results to a file on disk, or to a directory. Both can also leverage the numpy internal storage format and/or compression on save… and also save to memory mapped files, if you like.

If you use klepto, it takes the dictionary key as the filename, and saves the value as the contents. With klepto, you can also pick whether you want to use pickle or json or some other storage format.
```
Python 2.7.7 (default, Jun  2 2014, 01:33:50) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import klepto
>>> data_dict = klepto.archives.dir_archive('storage', cached=False, serialized=True)     
>>> import string
>>> import random
>>> for j in string.ascii_letters:
...   for k in range(1000):
...     data_dict.setdefault(j, []).append([int(10*random.random()) for i in range(3)])
... 
>>> 
```
This will create a directory called storage that contains pickled files, one for each key of your data_dict. There are keywords for using memmap files, and also for compression level. If you choose cached=False, then instead of dumping to file each time you wrote to data_dict, you'd write to memory each time… and you could then use data_dict.dump() to dump to disk whenever you choose… or you could pick a memory limit that when you hit it, you'd dump to disk. Additionally, you can also pick a caching strategy (like lru or lfu) for deciding which keys you would purge from memory and dump to disk.

Get klepto here: https://github.com/uqfoundation

or get joblib here: https://github.com/joblib/joblib

If you refactor, you could probably come up with a way to do this so it could take advantage of a pre-allocated array. However, it might depend on the profile of how your code runs.

Does opening and closing files affect run time? Yes. If you use klepto, you can set the granularity of when you want to dump to disk. Then you can pick a trade-off of speed versus intermediate storage of results.
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2021-01-07 13:55
1. Assuming you are using numpy for your numerical experiments, instead of pickle I would suggest using numpy.savez.
2. Keep it simple and make optimizations only if it you feel that the script runs too long.
3. Opening and closing files does affect the run time, but having a backup is anyway better.
And I would use collections.defaultdict(list) instead of plain dict and setdefault.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题