Update nested field for millions of documents

余生长醉 提交于 2020-04-14 07:29:56

问题


I use bulk update with script in order to update a nested field, but this is very slow :

POST index/type/_bulk

{"update":{"_id":"1"}}
{"script"{"inline":"ctx._source.nestedfield.add(params.nestedfield)","params":{"nestedfield":{"field1":"1","field2":"2"}}}}
{"update":{"_id":"2"}}
{"script"{"inline":"ctx._source.nestedfield.add(params.nestedfield)","params":{"nestedfield":{"field1":"3","field2":"4"}}}}

 ... [a lot more splitted in several batches]

Do you know another way that could be faster ?

It seems possible to store the script in order to not repeat it for each update, but I couldn't find a way to keep "dynamic" params.


回答1:


As often with performance optimization questions, there is no single answer since there are many possible causes of poor performance.

In your case you are making bulk update requests. When an update is performed, the document is actually being re-indexed:

... to update a document is to retrieve it, change it, and then reindex the whole document.

Hence it makes sense to take a look at indexing performance tuning tips. The first few things I would consider in your case would be selecting right bulk size, using several threads for bulk requests and increasing/disabling indexing refresh interval.

You might also consider using a ready-made client that supports parallel bulk requests, like Python elasticsearch client does.

It would be ideal to monitor ElasticSearch performance metrics to understand where the bottleneck is, and if your performance tweaks are giving actual gain. Here is an overview blog post about ElasticSearch performance metrics.



来源:https://stackoverflow.com/questions/46813530/update-nested-field-for-millions-of-documents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!