问题
After launching some long running write jobs (batch insert from an Apache Spark Job with Spark Cassandra Connector), Cassandra (v. 2.1) created thousands of SSTables for the target table (more than 4500). The minor compaction thresholds are set to the default values (4 to 32). This means that, in theory, a lot of minor compaction tasks should be scheduled automatically.
I checked the status and nodetool indicated that no tasks were being scheduled. I stopped doing any operation for few hours. Then I restarted the cluster multiple times. Waited some more time. Disabled and re-enabled autocompaction. Waited. Increased the throughput to 999 MB/s. Waited.
During these tests, just few minor compaction were randomly started in some nodes for a limited period of time. Most of the nodes have been doing nothing for an entire day.
Then, I decided to manually launch a Major compaction (it is going to take days... Amazon EBS).
Why is Cassandra not doing any minor auto-compaction, even if the number of SSTables is 100 times greater than the threshold (32) ?
回答1:
The answer is in the documentation:
By default, a minor compaction can begin any time Cassandra creates four SSTables on disk for a column family. A minor compaction must begin before the total number of SSTables reaches 32.
The total number of my SSTables is fairly greater than 32...
来源:https://stackoverflow.com/questions/27434397/cassandra-high-number-of-sstables