问题
I am trying to use klepto to do LRU caching. I would like to store the cache to disk, and am currently using klepto's dir_archive
option for this. I have written the following code, largely based on the code in the klepto test scripts:
def mymap(data):
return hashlib.sha256(data).hexdigest()
class MyLRUCache:
@lru_cache(cache=dir_archive(cached=False), keymap=mymap, ignore='self', maxsize=5)
def __call__(self, data)
return data
call = __call__
def store(self, data):
self.call(data)
# I would also appreciate a better way to do this, if possible.
def lookup(self, key):
return self.call.__cache__()[key]
This code appears to work fine until the cache reaches maxsize
. At that point, instead of using LRU to remove a single item, lru_cache
purges the entire cache! Below is the piece of klepto source code that does this (https://github.com/uqfoundation/klepto/blob/master/klepto/safe.py):
# purge cache
if _len(cache) > maxsize:
if cache.archived():
cache.dump()
cache.clear()
queue.clear()
refcount.clear()
else: # purge least recently used cache entry
key = queue_popleft()
refcount[key] -= 1
while refcount[key]:
key = queue_popleft()
refcount[key] -= 1
del cache[key], refcount[key]
So my question is, why does klepto purge "archived" caches? Is it possible to use lru_cache
and dir_archive
together?
Also, if my code looks completely nuts, I would really appreciate some sample code of how I should be writing this, since there was not much documentation for klepto.
ADDITIONAL NOTES:
I also tried defining dir_archive
with cached=True
. The in-memory cache still gets purged when maxsize
is reached, but the contents of the cache are dumped to the archived cache at that point. I have several problems with this:
- The in-memory cache is only accurate until
maxsize
is reached, at which point it is wiped. - The archived cache is not affected by
maxsize
. Every timemaxsize
is reached by the in-memory cache, all items in the in-memory cache are dumped to the archived cache, regardless of how many are already there. - LRU caching seems impossible based on points 1 and 2.
回答1:
The answer is that you couldn't before your question, but now you can.
If you get the most recent klepto
from github, and provide the new flag
purge=False
-- then you get the behavior you are looking for. I just added this in response to your question.
In your case:
lru_cache(cache=dir_archive(cached=False), keymap=mymap, ignore='self', maxsize=5, purge=False)
Or, for example:
@lru_cache(maxsize=3, cache=dict_archive('test'), purge=True)
def identity(x):
return x
identity(1)
identity(2)
identity(3)
ic = identity.__cache__()
assert len(ic.keys()) == 3
assert len(ic.archive.keys()) == 0
identity(4)
assert len(ic.keys()) == 0
assert len(ic.archive.keys()) == 4
identity(5)
assert len(ic.keys()) == 1
assert len(ic.archive.keys()) == 4
@lru_cache(maxsize=3, cache=dict_archive('test'), purge=False)
def inverse(x):
return -x
inverse(1)
inverse(2)
inverse(3)
ic = inverse.__cache__()
assert len(ic.keys()) == 3
assert len(ic.archive.keys()) == 0
inverse(4)
assert len(ic.keys()) == 3
assert len(ic.archive.keys()) == 1
inverse(5)
assert len(ic.keys()) == 3
assert len(ic.archive.keys()) == 2
Please add a ticket if this doesn't do what you were expecting. Thanks for the suggestion.
来源:https://stackoverflow.com/questions/32081207/how-to-use-lru-caching-on-disk-with-klepto