问题
My Compression strategy in Production was LZ4 Compression. But I modified it to Deflate
For compression change, we had to use nodetool Upgradesstables to forcefully upgrade the compression strategy on all sstables
But once upgradesstabloes command completed on all the 5 nodes in the cluster, My requests started to fail, both read and write
The issue is traced to a specific node out of the 5 node cluster and to a spcific table on that node. My whole cluster has roughly same amount of data and configuration , but 1 node in particular goes down is misbehaving
Output of nodetool status
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN xx.xxx.xx.xxx 283.94 GiB 256 40.4% 24950207-5fbc-4ea6-92aa-d09f37e83a1c rack1
UN xx.xxx.xx.xxx 280.55 GiB 256 39.9% 4ecdf7f8-a4d8-4a94-a930-1a87a80ae510 rack1
UN xx.xxx.xx.xxx 284.61 GiB 256 40.5% de2ada08-264b-421a-961f-5fd113f28208 rack1
UN YY.YYY.YY.YYY 280.44 GiB 256 40.2% 68c7c130-6cf8-4864-bde8-1819f238045c rack2
UN xx.xxx.xx.xxx 273.71 GiB 256 39.0% 6c080e47-ffb2-4fbc-bc7e-73df19103d2a rack2
Above YY.YYY.YY.YYY
Node is having errors
Cluster Configuration
- Replication Factor -> 2
- Read Consistency -> 1
- Write Consistency -> 1
- FYI, I am also using lightweight transaction Cassandra Version 3.10
Nodetool tablestats
for that specific table showing high dropped mutations
SSTable count: 11
Space used (live): 9.82 GiB
Space used (total): 9.82 GiB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 26.77 MiB
SSTable Compression Ratio: 0.1840953951763564
Number of keys (estimate): 15448921
Memtable cell count: 8558
Memtable data size: 5.89 MiB
Memtable off heap memory used: 0 bytes
Memtable switch count: 5
Local read count: 67792
Local read latency: 92.314 ms
Local write count: 31336
Local write latency: 0.067 ms
Pending flushes: 0
Percent repaired: 21.18
Bloom filter false positives: 1
Bloom filter false ratio: 0.00794
Bloom filter space used: 22.2 MiB
Bloom filter off heap memory used: 18.45 MiB
Index summary off heap memory used: 3.24 MiB
Compression metadata off heap memory used: 5.08 MiB
Compacted partition minimum bytes: 87
Compacted partition maximum bytes: 943127
Compacted partition mean bytes: 3058
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 4.13 KiB
nodetool info
shows
Gossip active : true
Thrift active : false
Native Transport active: true
Load : 280.43 GiB
Generation No : 1514537104
Uptime (seconds) : 8810363
Heap Memory (MB) : 1252.06 / 3970.00
Off Heap Memory (MB) : 573.33
Data Center : dc1
Rack : rack1
Exceptions : 18987
Key Cache : entries 351612, size 99.86 MiB, capacity 100 MiB, 11144584 hits, 21126425 requests, 0.528 recent hit rate, 14400 save period in seconds
Out of 5 Nodes , a specific node has a high no of Dropped Mutation "Around 560Kb" and Reads even though that node has same configuration as the other and owns equal amount of data.
We had tried to repair that node but That did not bring down the dropped mutation and the request kept failing.
We restarted the cassandra service on that node but the dropped mutation still kept on increasing
System.logs
ERROR [ReadRepairStage:10229] 2018-04-11 16:02:12,954 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:10229,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
ERROR [ReadRepairStage:10231] 2018-04-11 16:02:17,551 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:10231,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
ERROR [ReadRepairStage:10232] 2018-04-11 16:02:22,221 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:10232,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
Debug.Logs
DEBUG [ReadRepairStage:161301] 2018-04-11 01:45:01,432 DataResolver.java:169 - Timeout while read-repairing after receiving all 1 data and digest responses
ERROR [ReadRepairStage:161301] 2018-04-11 01:45:01,432 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:161301,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
DEBUG [ReadRepairStage:161304] 2018-04-11 01:45:02,692 ReadCallback.java:242 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-4042387324575455696, 229229902e5a43588d52466b8063b557) (d41d8cd98f00b204e9800998ecf8427e vs 4662dce3dcb05114ed670fbc40291d53)
at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
DEBUG [GossipStage:1] 2018-04-11 01:45:02,958 FailureDetector.java:457 - Ignoring interval time of 2000158817 for /xx.xxx.xx.xxx
WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-04-11 01:45:04,665 NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with average duration of 180655.05ms, 1 have exceeded the configured commit interval by an average of 170655.05ms
DEBUG [ReadRepairStage:161303] 2018-04-11 01:45:04,693 DataResolver.java:169 - Timeout while read-repairing after receiving all 1 data and digest responses
ERROR [ReadRepairStage:161303] 2018-04-11 01:45:04,709 CassandraDaemon.java:229 - Exception in thread Thread[ReadRepairStage:161303,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.10.jar:3.10]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.10.jar:3.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.10.jar:3.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
INFO [ScheduledTasks:1] 2018-04-11 01:45:07,353 MessagingService.java:1214 - MUTATION messages were dropped in last 5000 ms: 87 internal and 77 cross node. Mean internal dropped latency: 89509 ms and Mean cross-node dropped latency: 95871 ms
INFO [ScheduledTasks:1] 2018-04-11 01:45:07,354 MessagingService.java:1214 - HINT messages were dropped in last 5000 ms: 0 internal and 93 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 86440 ms
INFO [ScheduledTasks:1] 2018-04-11 01:45:07,354 MessagingService.java:1214 - READ_REPAIR messages were dropped in last 5000 ms: 0 internal and 72 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 73159 ms
Hope anyone can help me with this.
Update:
Nodetool info
after updated heap size to 9GB for this node.
ID : 68c7c130-6cf8-4864-bde8-1819f238045c
Gossip active : true
Thrift active : false
Native Transport active: true
Load : 279.32 GiB
Generation No : 1523504294
Uptime (seconds) : 9918
Heap Memory (MB) : 5856.73 / 9136.00
Off Heap Memory (MB) : 569.67
Data Center : dc1
Rack : rack2
Exceptions : 862
Key Cache : entries 3650, size 294.83 KiB, capacity 100 MiB, 8112 hits, 22015 requests, 0.368 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache : entries 7680, size 480 MiB, capacity 480 MiB, 1282773 misses, 1292444 requests, 0.007 recent hit rate, 3797.874 microseconds miss latency
Percent Repaired : 6.190888093280888%
Token : (invoke with -T/--tokens to see all 256 tokens)
回答1:
We faced this issue ourselves and we resolved this (as last resort) by removing the node from the cluster ( We belived there was some unknown hardware failure or memory leak of that sort )
We recommend you remove the node using nodetool removenode
instead of nodetool decomission
because we do not want to stream data from the failed node but instead from one of it's replica.
( This was a safe check and to avoid possibility of streaming corrupt data to other nodes. )
After we removed the node , the cluster health came back to normal and it was functioning normally.
回答2:
1) You're on 3.10, you should strongly consider 3.11.2. A lot of critical bugs are fixed in 3.11.2
2) If you have one node that's misbehaving, and RF=3, then it's likely that you're treating that one node differently than the others. It may be that you're having your application only connect to one host and the cost of coordinating is overwhelming it, or you may have a disproportionate amount of data on it because of some misconfiguration (looks like you have RF=3 with 2 racks, so it's certainly possible that it's not quite distributed how you expect).
来源:https://stackoverflow.com/questions/49788478/cassandra-cluster-specific-node-specific-table-high-dropped-mutations