Is there any good known solution for a fast persistent cache? [closed]

江枫思渺然 提交于 2021-02-07 13:48:30

问题


I need really fast and persistent cache for my web crawler. It doesnt need to be as fast as ConcurrentSkipListSet in Java, but definitely it cannot be MySQL with hash-index based table, which i tried. After 1m+ of records it takes like 80% of processor time.

Does any one know or heard of something useful for this case?
Thanks for any hint.


回答1:


Try EhCache. It's a primarily in-memory cache with options for overflow and persistence to disk backing store. Been around for years, still actively developed, and very mature.




回答2:


I'm an employee at Terracotta (not an engineer), but I figure adding some clarity regardless of my skill set would benefit those who've consulted this posting for answers.

Yes, Ehcache is a well-used option when it comes to caching, over 500,000 deployments internationally and is commonly used in small clusters w/ a distributed cache. If you're application is Java based, Terracotta will arguably offer the largest performance increases with "BigData" because it gives applications in memory speeds w/ off heap advantages.

  1. Yes, BigMemory Go is for free. Its a 32gb freemium offering, not to be confused with open source. It cannot be used in a distributed cache, that option is with BigMemory Max and the gb limit much less.

  2. BigMemory is persistent to disk. The Terracotta Server Array (L2) communicates w/ disk to ensure data isn't lost even in catastrophic power failures. Terracotta has acid-like properties, with 99.999% data durability. *This concept of the Terracotta Server Array usually causes a lot of confusion, refer to http://terracotta.org/documentation/terracotta-server-array/server-arrays for more information.

  3. BigMemory is an off heap data store, free from Garbage Collection entirely. This is done via byte code buffers and this data store is actively managed by Automatic Resource Control. Depending on your requirements you decides (i.e. how many objects you want in cache, whether you want immediate or eventual throughput, time to live of objects, etc) the Automatic Resource Control will do this work for you. This means no GC, heap sizes limited by your server's available memory, and most importantly, no tuning required.

  4. Knowing how large of a cache you need is a guess and check method, each application is unique and thus we cannot estimate confidently how much data you need to place into memory. I'd be suspicious of any vendor who tells you one needs to place "n" GB into cache to reach SLAs of xyz...

My apologies in advance if I broke a code of ethics by posting on here or there was any implied bias. Hopefully this info was able to add some clarity and shed some light on common questions about Terracotta.




回答3:


I am working on cache2k, and researching recent cache eviction policies to make it the fastest java cache around, see cache2k benchmarks.

Persistence is added right now and will be available for preview and testing in two weeks. I expect it to be very stable in five weeks. The cache2k implementation is, of course, not as mature as EHCache, however, everything released, is used in within our own applications and proves itself in production environments.

Update: The "two weeks" was very optimistic, since the whole locking concept needed finally a rewrite and careful inspection... You can track the persistence support currently emerging on github



来源:https://stackoverflow.com/questions/7132761/is-there-any-good-known-solution-for-a-fast-persistent-cache

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!