I inserted 10K entries in a table in Cassandra which has the TTL of 1 minute under the single partition.
After the successful insert, I tried to read all the data from a single partition but it throws an error like below,
WARN [ReadStage-2] 2018-04-04 11:39:44,833 ReadCommand.java:533 - Read 0 live rows and 100001 tombstone cells for query SELECT * FROM qcs.job LIMIT 100 (see tombstone_warn_threshold)
DEBUG [Native-Transport-Requests-1] 2018-04-04 11:39:44,834 ReadCallback.java:132 - Failed; received 0 of 1 responses
ERROR [ReadStage-2] 2018-04-04 11:39:44,836 StorageProxy.java:1906 - Scanned over 100001 tombstones during query 'SELECT * FROM qcs.job LIMIT 100' (last scanned row partion key was ((job), 2018-04-04 11:19+0530, 1, jobType1522820944168, jobId1522820944168)); query aborted
I understand tombstone is an marking in the sstable not the actual delete.
So I performed the compaction and repair using nodetool
Even after that when I read the data from the table, It throws the same error in log file.
1) How to handle this scenario?
2) Could some explain why this scenario happened and Why not the compaction and repair didn't solve this issue?
Tombstones are really deleted after period specified by gc_grace_seconds
setting of the table (it's 10 days by default). This is done to make sure that any node that was down at time of deletion will pickup these changes after recover. Here are the blog posts that discuss this in great details: from thelastpickle (recommended), 1, 2, and DSE documentation or Cassandra documentation.
You can set the gc_grace_seconds
option on the individual table to lower value to remove deleted data faster, but this should be done only for tables with TTLed data. You may also need to tweak tombstone_threshold
& tombstone_compaction_interval
table options to perform compactions faster. See this document or this document for description of these options.
来源:https://stackoverflow.com/questions/49644528/tombstone-vs-nodetool-and-repair