Is it possible to avoid tombstone problems with Cassandra?

杀马特。学长 韩版系。学妹 提交于 2019-12-24 13:31:19

问题


I am writing code for a CMS using Cassandra as the database system.

One of the strength of the CMS is to pre-calculate all sorts of things using a backend computer that permanently runs against data that changes in the CMS.

For example, the CMS tells the list system that a page was created or changed. The list system saves that information in a table called list. That information is just a one liner which tells me which page has to be worked on.

Column family: list
   Row: concerned website (i.e. http://www.example.com/)
     Column: full URI (i.e. http://www.example.com/this/page)
        Value: true (because you need something for the column to exist)

Once in a while (most often less than a second after a simple page edit), that list backend system wakes up and sees that a certain page changed and starts working on it by updating all the lists that include (or do not include anymore) that page as an element. This allows the front end to instantly know the number of elements in a list and to read lists very quickly without running complex queries at the time the list is needed (opposed to what many CMS do using SQL...)

In effect, I am using the list table as a TODO list. A set of pages I have to work on. So the front end adds page references to that list, and the backend deletes them once done with them. As a result I can end up with a very large number of tombstones in the list table. The real world effect: I had tombstone failures and the system started failing in random places. And once when the list stops working, many other things in the system stop working and the websites become unusable.

I decreased the time it takes Cassandra to take care of tombstones in that specific table (and a few others) but I am wondering whether I'm using Cassandra as expected. Whether there is a better way to handle a TODO list of this sort in this environment?

As a side note: the TODO list may be worked on from various different backend computers. On a small system, you are likely to have only one backend running against the list data, on larger systems with thousands of users, you are not unlikely to have 2 or 3 backends just to handle lists. So having the data in Cassandra is very practical to share it quickly between computers.


回答1:


You essentially implemented a queue which is considered an anti-pattern for cassandra: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

There are work arounds and things people do to make them better but its a hard game to play. Be sure to use LeveledCompactionStrategy and not the default, this will help a lot in smaller workloads. Consider the work arounds like time boxing the partitions (rows in old thrift terminology) and whats in the article linked above but you may want to look for a different solution.



来源:https://stackoverflow.com/questions/36240706/is-it-possible-to-avoid-tombstone-problems-with-cassandra

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!