Blocking execute fethod from com.datastax.driver.core.Session
public ResultSet execute(Statement statement);
Comment on this method:
<
If I have n number of threads, all threads will have the same amount of records they need to send to the database. All of them continue sending multiple insert queries to cassandra using blocking execute call. If I increase the value of n, will it also helps to speed up the time that I need to insert all records to cassandra?
To some extent. Lets divorce the client implementation details a bit and look at things from the perspective of "Number of concurrent requests", as you don't need to have a thread for each ongoing request if you use executeAsync. In my testing I have found that while there is a lot of value in having a high number of concurrent requests, there is a threshold for which there are diminishing returns or performance starts to degrade. My general rule of thumb is (number of Nodes *
native_transport_max_threads (default: 128)* 2)
, but you may find more optimal results with more or less.
The idea here is that there is not much value in enqueuing more requests than cassandra will handle at a time. While reducing the number of inflight requests, you limit unnecessary congestion on the connections between your driver client and cassandra.
Question 2: With non-blocking execute, how can I assure that all of the insertions is successful? The only way I know is waiting for the ResultSetFuture to check the execution of the insert query. Is there any better way I can do ? Is there a higher chance that non-blocking execute is easier to fail then blocking execute?
Waiting on the ResultSetFuture via get
is one route, but if you are developing a fully async application, you want to avoid blocking as much as possible. Using guava, your two best weapons are Futures.addCallback and Futures.transform.
Futures.addCallback
allows you to register a FutureCallback that gets executed when the driver has received the response. onSuccess
gets executed in the success case, onFailure
otherwise.
Futures.transform
allows you to effectively map the returned ResultSetFuture
into something else. For example if you only want the value of 1 column you could use it to transform ListenableFuture
to a ListenableFuture
without having to block in your code on the ResultSetFuture
and then getting the String value.
In the context of writing a dataloader program, you could do something like the following:
Semaphore
or some other construct with a fixed number of permits (that will be your maximum number of inflight requests). Whenever you go to submit a query using executeAsync
, acquire a permit. You should really only need 1 thread (but may want to introduce a pool of # cpu cores size that does this) that acquires the permits from the Semaphore and executes queries. It will just block on acquire until there is an available permit.Futures.addCallback
for the future returned from executeAsync
. The callback should call Sempahore.release()
in both onSuccess
and onFailure
cases. By releasing a permit, this should allow your thread in step 1 to continue and submit the next request.To further improve throughput, you might want to consider using BatchStatement
and submitting requests in batches. This is a good option if you keep your batches small (50-250 is a good number) and if your inserts in a batch all share the same partition key.