问题
I want to write to cassandra from a data frame and I want to exclude the rows if a particular row is already existing (i.e Primary key- though upserts happen I don't want to change the other columns) using spark-cassandra connector. Is there a way we can do that?
Thanks.!
回答1:
You can use the ifNotExists WriteConf
option which was introduced in this pr.
It works like so:
val writeConf = WriteConf(ifNotExists = true)
rdd.saveToCassandra(keyspaceName, tableName, writeConf = writeConf)
回答2:
You can do
sparkConf.set("spark.cassandra.output.ifNotExists", "true")
With this config
if partition key and clustering column are same as row which exists in cassandra
: write will be ignored
else write will be performed
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html#reference_ds_gp2_1jp_xj__if-not-exists
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#write-tuning-parameters
回答3:
Srinu, this all boils down to "read before write" no matter whether you are using Spark or not.
But there is IF NOT EXISTS
clause:
If the column exists, it is updated. The row is created if none exists. Use IF NOT EXISTS to perform the insertion only if the row does not already exist. Using IF NOT EXISTS incurs a performance hit associated with using Paxos internally. For information about Paxos, see Cassandra 2.1 documentation or Cassandra 2.0 documentation.
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html
来源:https://stackoverflow.com/questions/41307386/how-to-insert-rows-into-cassandra-if-they-dont-exist-using-spark-cassandra-dri