Cassandra Timeouts with No CPU Usage

回眸只為那壹抹淺笑 提交于 2019-12-06 00:47:33

2 questions that'll be helpful:

  1. What's your timeout set to
  2. What's the query?

Now some clarification on where I think you're going wrong here:

  1. the resolution is too coarse to diagnose a single query, I could have a server doing nothing, do one expensive query that pegs some bottleneck for the entire time and on that scale look like nothing was bottlenecked, run iostat -x 1 on the servers at the same time and you may find something drastically different than what the charts say at that resolution.
  2. If I'm looking at your CPU usage chart correctly there it looks like 50% usage. On modern servers that's actually fully busy because of hyperthreading and how aggregate CPU usage works see https://www.percona.com/blog/2015/01/15/hyper-threading-double-cpu-throughput/

I suggest tracing the problematic query to see what cassandra was doing.

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tracing_r.html

Open cql shell, type TRACING ON and execute your query. If everything seems fine, there is a chance that this problem happens occasionally, in which case I'd suggest tracing the queries using nodetool settraceprobablilty for some time, until you manage to catch the problem.

You enable it on each node separately using nodetool settraceprobability <param> where param is the probability (between 0 and 1) that the query will get traced. Careful: this WILL cause increased load, so start with a very low number and go up.

If this problem is occasional there is a chance that this might be caused by long garbage collections, in which case you need to analyse the GC logs. Check how long your GC's are.

edit: just to be clear, if this problem is caused by GC's you will NOT see it with tracing. So first check your GC's, and if its not the problem then move on to tracing.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!