Cassandra Cluster - Specific Node - specific table high Dropped Mutations

问题

My Compression strategy in Production was LZ4 Compression. But I modified it to Deflate

For compression change, we had to use nodetool Upgradesstables to forcefully upgrade the compression strategy on all sstables

But once upgradesstabloes command completed on all the 5 nodes in the cluster, My requests started to fail, both read and write

The issue is traced to a specific node out of the 5 node cluster and to a spcific table on that node. My whole cluster has roughly same amount of data and configuration , but 1 node in particular goes down is misbehaving

Output of nodetool status

|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  xx.xxx.xx.xxx  283.94 GiB  256          40.4%             24950207-5fbc-4ea6-92aa-d09f37e83a1c  rack1
UN  xx.xxx.xx.xxx  280.55 GiB  256          39.9%             4ecdf7f8-a4d8-4a94-a930-1a87a80ae510  rack1
UN  xx.xxx.xx.xxx  284.61 GiB  256          40.5%             de2ada08-264b-421a-961f-5fd113f28208  rack1
UN  YY.YYY.YY.YYY  280.44 GiB  256          40.2%             68c7c130-6cf8-4864-bde8-1819f238045c  rack2
UN  xx.xxx.xx.xxx  273.71 GiB  256          39.0%             6c080e47-ffb2-4fbc-bc7e-73df19103d2a  rack2

Above YY.YYY.YY.YYY Node is having errors

Cluster Configuration

Replication Factor -> 2
- Read Consistency -> 1
- Write Consistency -> 1
- FYI, I am also using lightweight transaction Cassandra Version 3.10

Nodetool tablestats for that specific table showing high dropped mutations

                SSTable count: 11
                Space used (live): 9.82 GiB
                Space used (total): 9.82 GiB
                Space used by snapshots (total): 0 bytes
                Off heap memory used (total): 26.77 MiB
                SSTable Compression Ratio: 0.1840953951763564
                Number of keys (estimate): 15448921
                Memtable cell count: 8558
                Memtable data size: 5.89 MiB
                Memtable off heap memory used: 0 bytes
                Memtable switch count: 5
                Local read count: 67792
                Local read latency: 92.314 ms
                Local write count: 31336
                Local write latency: 0.067 ms
                Pending flushes: 0
                Percent repaired: 21.18
                Bloom filter false positives: 1
                Bloom filter false ratio: 0.00794
                Bloom filter space used: 22.2 MiB
                Bloom filter off heap memory used: 18.45 MiB
                Index summary off heap memory used: 3.24 MiB
                Compression metadata off heap memory used: 5.08 MiB
                Compacted partition minimum bytes: 87
                Compacted partition maximum bytes: 943127
                Compacted partition mean bytes: 3058
                Average live cells per slice (last five minutes): 1.0
                Maximum live cells per slice (last five minutes): 1
                Average tombstones per slice (last five minutes): 1.0
                Maximum tombstones per slice (last five minutes): 1
                Dropped Mutations: 4.13 KiB

nodetool info shows

Gossip active          : true
Thrift active          : false
Native Transport active: true
Load                   : 280.43 GiB
Generation No          : 1514537104
Uptime (seconds)       : 8810363
Heap Memory (MB)       : 1252.06 / 3970.00
Off Heap Memory (MB)   : 573.33
Data Center            : dc1
Rack                   : rack1
Exceptions             : 18987
Key Cache              : entries 351612, size 99.86 MiB, capacity 100 MiB, 11144584 hits, 21126425 requests, 0.528 recent hit rate, 14400 save period in seconds

Out of 5 Nodes , a specific node has a high no of Dropped Mutation "Around 560Kb" and Reads even though that node has same configuration as the other and owns equal amount of data.

We had tried to repair that node but That did not bring down the dropped mutation and the request kept failing.

We restarted the cassandra service on that node but the dropped mutation still kept on increasing

System.logs

ERROR [ReadRepairStage:10229] 2018-04-11 16:02:12,954 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:10229,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
    at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
ERROR [ReadRepairStage:10231] 2018-04-11 16:02:17,551 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:10231,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
    at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
ERROR [ReadRepairStage:10232] 2018-04-11 16:02:22,221 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:10232,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
    at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]

Debug.Logs

DEBUG [ReadRepairStage:161301] 2018-04-11 01:45:01,432 DataResolver.java:169 - Timeout while read-repairing after receiving all 1 data and digest responses
ERROR [ReadRepairStage:161301] 2018-04-11 01:45:01,432 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:161301,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
    at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
DEBUG [ReadRepairStage:161304] 2018-04-11 01:45:02,692 ReadCallback.java:242 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-4042387324575455696, 229229902e5a43588d52466b8063b557) (d41d8cd98f00b204e9800998ecf8427e vs 4662dce3dcb05114ed670fbc40291d53)
    at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.10.jar:3.10]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_144]
    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
DEBUG [GossipStage:1] 2018-04-11 01:45:02,958 FailureDetector.java:457 - Ignoring interval time of 2000158817 for /xx.xxx.xx.xxx
WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-04-11 01:45:04,665 NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with average duration of 180655.05ms, 1 have exceeded the configured commit interval by an average of 170655.05ms
DEBUG [ReadRepairStage:161303] 2018-04-11 01:45:04,693 DataResolver.java:169 - Timeout while read-repairing after receiving all 1 data and digest responses
ERROR [ReadRepairStage:161303] 2018-04-11 01:45:04,709 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:161303,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
    at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
INFO  [ScheduledTasks:1] 2018-04-11 01:45:07,353 MessagingService.java:1214 - MUTATION messages were dropped in last 5000 ms: 87 internal and 77 cross node. Mean internal dropped latency: 89509 ms and Mean cross-node dropped latency: 95871 ms
INFO  [ScheduledTasks:1] 2018-04-11 01:45:07,354 MessagingService.java:1214 - HINT messages were dropped in last 5000 ms: 0 internal and 93 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 86440 ms
INFO  [ScheduledTasks:1] 2018-04-11 01:45:07,354 MessagingService.java:1214 - READ_REPAIR messages were dropped in last 5000 ms: 0 internal and 72 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 73159 ms

Hope anyone can help me with this.

Update:

Nodetool info after updated heap size to 9GB for this node.

ID                     : 68c7c130-6cf8-4864-bde8-1819f238045c
Gossip active          : true
Thrift active          : false
Native Transport active: true
Load                   : 279.32 GiB
Generation No          : 1523504294
Uptime (seconds)       : 9918
Heap Memory (MB)       : 5856.73 / 9136.00
Off Heap Memory (MB)   : 569.67
Data Center            : dc1
Rack                   : rack2
Exceptions             : 862
Key Cache              : entries 3650, size 294.83 KiB, capacity 100 MiB, 8112 hits, 22015 requests, 0.368 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache            : entries 7680, size 480 MiB, capacity 480 MiB, 1282773 misses, 1292444 requests, 0.007 recent hit rate, 3797.874 microseconds miss latency
Percent Repaired       : 6.190888093280888%
Token                  : (invoke with -T/--tokens to see all 256 tokens)

回答1:

We faced this issue ourselves and we resolved this (as last resort) by removing the node from the cluster ( We belived there was some unknown hardware failure or memory leak of that sort )

We recommend you remove the node using nodetool removenode instead of nodetool decomission because we do not want to stream data from the failed node but instead from one of it's replica. ( This was a safe check and to avoid possibility of streaming corrupt data to other nodes. )

After we removed the node , the cluster health came back to normal and it was functioning normally.

回答2:

1) You're on 3.10, you should strongly consider 3.11.2. A lot of critical bugs are fixed in 3.11.2

2) If you have one node that's misbehaving, and RF=3, then it's likely that you're treating that one node differently than the others. It may be that you're having your application only connect to one host and the cost of coordinating is overwhelming it, or you may have a disproportionate amount of data on it because of some misconfiguration (looks like you have RF=3 with 2 racks, so it's certainly possible that it's not quite distributed how you expect).

来源：https://stackoverflow.com/questions/49788478/cassandra-cluster-specific-node-specific-table-high-dropped-mutations

标签

cassandra

nosql

out-of-memory

cassandra-2.0

cassandra-3.0