可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a pseudocode in python that reads from a Kafka stream and upsert documents in Elasticsearch (incrementing a counter view if the document exists already.

for message in consumer:      msg = json.loads(message.value)     print(msg)     index = INDEX_NAME     es_id = msg["id"]     script = {"script":"ctx._source.view+=1","upsert" : msg}     es.update(index=index, doc_type="test", id=es_id, body=script)

Since I want to use it in a distributed environment, I am using Spark Structured Streaming

df.writeStream \ .format("org.elasticsearch.spark.sql")\ .queryName("ESquery")\ .option("es.resource","credentials/url") \ .option("checkpointLocation", "checkpoint").start()

or SparkStreaming in scala that reads from KafkaStream:

// Initializing Spark Streaming Context and kafka stream sparkConf.setMaster("local[2]") val ssc = new StreamingContext(sparkConf, Seconds(10)) [...]  val messages = KafkaUtils.createDirectStream[String, String](       ssc,       PreferConsistent,       Subscribe[String, String](topicsSet, kafkaParams)     )  [...] val urls = messages.map(record => JsonParser.parse(record.value()).values.asInstanceOf[Map[String, Any]]) urls.saveToEs("credentials/credential")

.saveToEs(...) is the API of elastic-hadoop.jar documented here. Unfortunately this repo is not really well documented. So I cannot understand where I can put the script command.

Is there anyone can help me? Thank you in advance

回答1:

You should be able to do it by setting write mode "update" ( or upsert) and passing your script as "script" (depends on ES version).

EsSpark.saveToEs(rdd, "spark/docs", Map("es.mapping.id" -> "id", "es.write.operation" -> "update","es.update.script.inline" -> "your script" , ))

Probably you want to use "upsert"

There are some good unit tests in cascading integration in same library; These settings should be good for spark as both uses same writer.

I suggest to read unit tests to pick correct settings for your ES version.

转载请标明出处:How to upsert or partial updates with script documents in ElasticSearch with Spark?

文章来源: How to upsert or partial updates with script documents in ElasticSearch with Spark?

标签

update

脚本

string