DataStax Enterprise: saveToCassandra generate a lot of hinted handoff

痴心易碎 提交于 2019-12-04 20:17:06

The sleep in your statement is most likely not actually slowing down the execution of this query. Since the operations are applied on a per partition basis, my guess would be that the sleep simply pauses before an entire partition begin's being written.

For the real issue. The only reason that you will be generating hints is if one of your nodes is unable to keep up with the amount of data being written by your spark job. It means that a node was unreachable during the execution of the mutation so the coordinating node saved a serialized copy of the mutation for when the unreachable node came back online. You can throttle down the batch size to lower the amount of concurrent writes using

spark.cassandra.output.batch.size.rows: number of rows per single batch; default is 'auto' which means the connector will adjust the number of rows based on the amount of data in each row

or

spark.cassandra.output.batch.size.bytes: maximum total size of the batch in bytes; defaults to 64 kB.

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md

But you can most likely better increase your throughput by making sure all of your harddrives in you cluster are on SSD's and that the commitlog/spark directories are all also on SSDs.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!