How to upsert or partial updates with script documents in ElasticSearch with Spark?

匿名 (未验证) 提交于 2019-12-03 02:41:02

问题:

I have a pseudocode in python that reads from a Kafka stream and upsert documents in Elasticsearch (incrementing a counter view if the document exists already.

for message in consumer:      msg = json.loads(message.value)     print(msg)     index = INDEX_NAME     es_id = msg["id"]     script = {"script":"ctx._source.view+=1","upsert" : msg}     es.update(index=index, doc_type="test", id=es_id, body=script)

Since I want to use it in a distributed environment, I am using Spark Structured Streaming

df.writeStream \ .format("org.elasticsearch.spark.sql")\ .queryName("ESquery")\ .option("es.resource","credentials/url") \ .option("checkpointLocation", "checkpoint").start()

or SparkStreaming in scala that reads from KafkaStream:

// Initializing Spark Streaming Context and kafka stream sparkConf.setMaster("local[2]") val ssc = new StreamingContext(sparkConf, Seconds(10)) [...]  val messages = KafkaUtils.createDirectStream[String, String](       ssc,       PreferConsistent,       Subscribe[String, String](topicsSet, kafkaParams)     )  [...] val urls = messages.map(record => JsonParser.parse(record.value()).values.asInstanceOf[Map[String, Any]]) urls.saveToEs("credentials/credential")

.saveToEs(...) is the API of elastic-hadoop.jar documented here. Unfortunately this repo is not really well documented. So I cannot understand where I can put the script command.

Is there anyone can help me? Thank you in advance

回答1:

You should be able to do it by setting write mode "update" ( or upsert) and passing your script as "script" (depends on ES version).

EsSpark.saveToEs(rdd, "spark/docs", Map("es.mapping.id" -> "id", "es.write.operation" -> "update","es.update.script.inline" -> "your script" , ))

Probably you want to use "upsert"

There are some good unit tests in cascading integration in same library; These settings should be good for spark as both uses same writer.

I suggest to read unit tests to pick correct settings for your ES version.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!