Cassandra Data Replication problem

亡梦爱人 提交于 2019-12-04 12:02:09

问题


I have a 2 node cassandra cluster with a replication factor of 2 and AutoBootStrap=true. Everything is good during startup and both nodes see each other. Let us call these nodes A and B.

  1. Add a set of keys and columns (lets call this set K1) to cassandra through node A.
  2. Connect to node A and read back set K1. Same on Node B. Success - Good
  3. Kill Cassandra process on Node B.
  4. Add set K2 through A.
  5. Connect to node A and read set K2. Good
  6. Restart Cassandra process on Node B.
  7. Try to read all keys from B... set K1 present, set K2 MISSING. (Even after 30 minutes)
  8. Add K3 to A/B.
  9. Read all keys from A - returns set K1, K2, K3
  10. Read all keys from B - returns set K1, K3.

B never syncs set K2... (Its been more than 12 hours) Why does node B not see set K2... anyone has any idea?


Added Info :

Ok... this was the problem. The read_consistency_level was set to 1 by default.

So when we ask node B for set K2, and it doesnt have it (when it is supposed to because of the replication factor = 2), it immediately returns with a 'Not found' error.

However, if we use read consistency to be QUORUM or ALL, then B is forced to ask A, which then returns the correct value and B syncs up that key (saves it locally).

This leads to another problem - This means that when node B comes up, it is not syncing all the data from Node A, even after a long time. Now if node A goes down, how can we access that unsynced data? (I just tested that we cant)

I guess there must be a way to force syncing the data. I see the INFO in the terminal output that a hinted handoff of 15 rows from A to B occured when B came up, but B does not have those rows locally (because we still cant read it from B with consistency level ONE). Whats going on here?


回答1:


There are 3 ways cassandra syncs updates that happened while a node was down:

  1. hinted handoff. requires that failure detector on A recognize that B is down before you write K2. See http://wiki.apache.org/cassandra/HintedHandoff
  2. read repair. requires that B be up when K2 is requested for the repair to happen. See http://wiki.apache.org/cassandra/ReadRepair
  3. anti-entropy repair. requires invoking manually ("nodetool repair"). see http://wiki.apache.org/cassandra/AntiEntropy


来源:https://stackoverflow.com/questions/3827441/cassandra-data-replication-problem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!