How to multi insert rows in cassandra

会有一股神秘感。 提交于 2020-01-01 08:18:57

问题


What is the most efficient way of inserting multiple rows in cassandra column family. Is it possible to do this in a single call.

Right now my approach is to addinsert multiple column and then execute. There in a single call I am persisting one row. I am looking for strategy so that I can do a batch insert.


回答1:


CQL contains a BEGIN BATCH...APPLY BATCH statement that allows you to group multiple inserts so that a developer can create and execute a series of requests (see http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0).

The following worked for me (Scala):

PreparedStatement ps = session.prepare(
"BEGIN BATCH" +    
"INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" +    
"INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" +    
"INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" +    
"APPLY BATCH" ); 

session.execute(ps.bind(uid, mid1, title1, body1, uid, mid2, title2, body2, uid, mid3, title3, body3));

If you don't know in advance which statements you want to execute, you can use the following syntax (Scala):

var statement: PreparedStatement = session.prepare("INSERT INTO people (name,age) VALUES (?,?)")
var boundStatement = new BoundStatement(statement)
val batchStmt = new BatchStatement()
batchStmt.add(boundStatement.bind("User A", "10"))
batchStmt.add(boundStatement.bind("User B", "12"))
session.execute(batchStmt)

Note: BatchStatement can only hold up to 65536 statements. I learned that the hard way. :-)




回答2:


PreparedStatement and binding values might be a better option. Below are a couple of good articles on uses and misuses of Batch:

Cassandra: Batch loading without the Batch keyword.

Using and misusing batches




回答3:


There is a batch insert operation in Cassandra. You can batch together inserts, even in different column families, to make insertion more efficient.

In Hector, you can use HFactory.createMutator then use the add methods on the returned Mutator to add operations to your batch. When ready, call execute().

If you're using CQL, then you group things into a batch by starting the batch with BEGIN BATCH and ending with APPLY BATCH.




回答4:


you can add your multiple insert statements into a file and execute the file with 'cqlsh -f'.

You can also perform Batch insert with CQL into cassandra as described in below link: http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/cql_reference/batch_r.html




回答5:


When trying to insert multiple rows. Database connection RTT could be the performance bottle neck. In that case, we generally need a way to avoid waiting for one INSERT to finish so that we can begin our next INSERT.Currently there are two ways as far as I know:

  • If data consistency matters, use LOGGED BATCH, but as this question said, BATCH may not have a performance boost in all the situation.
  • Otherwise, use a async api in the Cassandra client library, for example in python there is a execute_async method

Also, you can prepare the SQL statement before execute it. I haven't test the overall performance of a prepared statement vs plain insert. But I think if there are thousands INSERT or more you should get a performance boost.



来源:https://stackoverflow.com/questions/17885238/how-to-multi-insert-rows-in-cassandra

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!