Async writes seem to be broken in Cassandra

后端 未结 2 1650
野趣味
野趣味 2020-12-30 08:04

I have had issues with spark-cassandra-connector (1.0.4, 1.1.0) when writing batches of 9 millions rows to a 12 nodes cassandra (2.1.2) cluster. I was writing with consisten

2条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-30 08:39

    Nicola and I communicated over email this weekend and thought I'd provide an update here with my current theory. I took a look at the github project Nicola shared and experimented with an 8 node cluster on EC2.

    I was able to reproduce the issue with 2.1.2, but did observe that after a period of time I could re-execute the spark job and all 9 million rows were returned.

    What I seemed to notice was that while nodes were under compaction I did not get all 9 million rows. On a whim I took a look at the change log for 2.1 and observed an issue CASSANDRA-8429 - "Some keys unreadable during compaction" that may explain this problem.

    Seeing that the issue has been fixed at is targeted for 2.1.3, I reran the test against the cassandra-2.1 branch and ran the count job while compaction activity was happening and got 9 million rows back.

    I'd like to experiment with this some more since my testing with the cassandra-2.1 branch was rather limited and the compaction activity may have been purely coincidental, but I'm hoping this may explain these issues.

提交回复
热议问题