I have a service that consumes messages off of a queue at a rate that I control. I do some processing and then attempt to write to a Cassandra cluster via the Datastax Java
One possible idea not to kill the cluster is to "throttle" your calls to executeAsync e.g. after a batch of 100 (or whatever number is the best for your cluster and workload), you'll do a sleep in the client code and do a blocking call on all the 100 futures (or use Guava library to transform a list of future into a future of list)
This way, after issuing 100 async queries, you'll force the client application to wait for all of them to succeed before proceeding further. If you catch any exception when calling future.get(), you can schedule a retry. Normally the retry is already attempted by the default RetryStrategy of the Java driver.
About back-pressure signal from server, starting from CQL binary protocol V3, there is an error code that notifies the client that the coordinator is overloaded : https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3.spec#L951
From the client, you can get this overloaded information in 2 ways: