How to insert rows into cassandra if they don't exist using spark- cassandra driver?

问题

I want to write to cassandra from a data frame and I want to exclude the rows if a particular row is already existing (i.e Primary key- though upserts happen I don't want to change the other columns) using spark-cassandra connector. Is there a way we can do that?

Thanks.!

回答1:

You can use the ifNotExists WriteConf option which was introduced in this pr.

It works like so:

val writeConf = WriteConf(ifNotExists = true)
rdd.saveToCassandra(keyspaceName, tableName, writeConf = writeConf)

回答2:

You can do

sparkConf.set("spark.cassandra.output.ifNotExists", "true")

With this config
if partition key and clustering column are same as row which exists in cassandra:
write will be ignored
else write will be performed

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html#reference_ds_gp2_1jp_xj__if-not-exists

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#write-tuning-parameters

回答3:

Srinu, this all boils down to "read before write" no matter whether you are using Spark or not.

But there is IF NOT EXISTS clause:

If the column exists, it is updated. The row is created if none exists. Use IF NOT EXISTS to perform the insertion only if the row does not already exist. Using IF NOT EXISTS incurs a performance hit associated with using Paxos internally. For information about Paxos, see Cassandra 2.1 documentation or Cassandra 2.0 documentation.

http://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html

来源：https://stackoverflow.com/questions/41307386/how-to-insert-rows-into-cassandra-if-they-dont-exist-using-spark-cassandra-dri

标签

scala

apache-spark

cassandra

spark-cassandra-connector