How to insert rows into cassandra if they don't exist using spark- cassandra driver?

可紊 提交于 2019-12-11 04:14:21

问题


I want to write to cassandra from a data frame and I want to exclude the rows if a particular row is already existing (i.e Primary key- though upserts happen I don't want to change the other columns) using spark-cassandra connector. Is there a way we can do that?

Thanks.!


回答1:


You can use the ifNotExists WriteConf option which was introduced in this pr.

It works like so:

val writeConf = WriteConf(ifNotExists = true)
rdd.saveToCassandra(keyspaceName, tableName, writeConf = writeConf)



回答2:


You can do

sparkConf.set("spark.cassandra.output.ifNotExists", "true")

With this config
if partition key and clustering column are same as row which exists in cassandra:
write will be ignored
else write will be performed

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html#reference_ds_gp2_1jp_xj__if-not-exists

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#write-tuning-parameters




回答3:


Srinu, this all boils down to "read before write" no matter whether you are using Spark or not.

But there is IF NOT EXISTS clause:

If the column exists, it is updated. The row is created if none exists. Use IF NOT EXISTS to perform the insertion only if the row does not already exist. Using IF NOT EXISTS incurs a performance hit associated with using Paxos internally. For information about Paxos, see Cassandra 2.1 documentation or Cassandra 2.0 documentation.

http://docs.datastax.com/en/cql/3.1/cql/cql_reference/insert_r.html



来源:https://stackoverflow.com/questions/41307386/how-to-insert-rows-into-cassandra-if-they-dont-exist-using-spark-cassandra-dri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!