I need to remove a field in all the documents indexed to Elasticsearch . How can i do it. Will any of the delete queries help me achieve this.
问题:
回答1:
What @backtrack told is true , but then there is a very convenient way of doing this in Elasticsearch. Elasticsearch will abstract out the internal complexity of the deletion. You need to use update API to achieve this -
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{ "script" : "ctx._source.remove(\"name_of_field\")" }'
You can find more documentation here.
Note: As of Elastic Search 6 you are required to include a content-type header:
-H 'Content-Type: application/json'
回答2:
Elasticsearch added update_by_query
in 2.3. This experimental interface allows you to do the update against all the documents that match a query.
Internally elasticsearch does a scan/scroll to collect batches of documents and then update them like the bulk update interface. This is faster than doing it manually with your own scan/scroll interface due to not having the overhead of network and serialization. Each record must be loaded into ram, modified and then written.
Yesterday I removed a large field from my ES cluster. I saw sustained throughput of 10,000 records per second during the update_by_query, constrained by CPU rather than IO.
Look into setting conflict=proceed
if the cluster has other update traffic, or the whole job will stop when it hits a ConflictError
when one of the records is updated underneath one of the batches.
Similarly setting wait_for_completion=false
will cause the update_by_query to run via the tasks interface. Otherwise the job will terminate if the connection is closed.
url:
http://localhost:9200/type/_update_by_query?wait_for_completion=false&conflict=proceed
POST body:
{ "script": ctx._source.remove("name_of_field"), "query": { "bool": { "must": [ { "exists": { "field": "name_of_field" } } ] } } }
As of Elasticsearch 1.43, inline groovy scripting is disabled by default. You'll need to enable it for an inline script like this to work by adding script.inline: true
to your config file.
Or upload the groovy as a script and use the "script": { "file": "scriptname", "lang": "groovy"}
format.
回答3:
By default it's not possible, because right now Lucene doesn't support that. Basically you can only put or remove whole Lucene documents from Lucene indices.
- Get the first version of your doc
- remove the field
- push this new version of your doc