Remove a field from a Elasticsearch document

匿名 (未验证) 提交于 2019-12-03 01:20:02

问题:

I need to remove a field in all the documents indexed to Elasticsearch . How can i do it. Will any of the delete queries help me achieve this.

回答1:

What @backtrack told is true , but then there is a very convenient way of doing this in Elasticsearch. Elasticsearch will abstract out the internal complexity of the deletion. You need to use update API to achieve this -

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{     "script" : "ctx._source.remove(\"name_of_field\")" }' 

You can find more documentation here.

Note: As of Elastic Search 6 you are required to include a content-type header:

-H 'Content-Type: application/json' 


回答2:

Elasticsearch added update_by_query in 2.3. This experimental interface allows you to do the update against all the documents that match a query.

Internally elasticsearch does a scan/scroll to collect batches of documents and then update them like the bulk update interface. This is faster than doing it manually with your own scan/scroll interface due to not having the overhead of network and serialization. Each record must be loaded into ram, modified and then written.

Yesterday I removed a large field from my ES cluster. I saw sustained throughput of 10,000 records per second during the update_by_query, constrained by CPU rather than IO.

Look into setting conflict=proceed if the cluster has other update traffic, or the whole job will stop when it hits a ConflictError when one of the records is updated underneath one of the batches.

Similarly setting wait_for_completion=false will cause the update_by_query to run via the tasks interface. Otherwise the job will terminate if the connection is closed.

url:

http://localhost:9200/type/_update_by_query?wait_for_completion=false&conflict=proceed 

POST body:

{   "script": ctx._source.remove("name_of_field"),   "query": {     "bool": {       "must": [         {           "exists": {             "field": "name_of_field"           }         }       ]     }   } } 

As of Elasticsearch 1.43, inline groovy scripting is disabled by default. You'll need to enable it for an inline script like this to work by adding script.inline: true to your config file.

Or upload the groovy as a script and use the "script": { "file": "scriptname", "lang": "groovy"} format.



回答3:

By default it's not possible, because right now Lucene doesn't support that. Basically you can only put or remove whole Lucene documents from Lucene indices.

  1. Get the first version of your doc
  2. remove the field
  3. push this new version of your doc


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!