Remove a field from a Elasticsearch document

后端 未结 5 1103
长情又很酷
长情又很酷 2020-12-02 13:00

I need to remove a field in all the documents indexed to Elasticsearch . How can i do it. Will any of the delete queries help me achieve this.

5条回答
  •  忘掉有多难
    2020-12-02 13:24

    Elasticsearch added update_by_query in 2.3. This experimental interface allows you to do the update against all the documents that match a query.

    Internally elasticsearch does a scan/scroll to collect batches of documents and then update them like the bulk update interface. This is faster than doing it manually with your own scan/scroll interface due to not having the overhead of network and serialization. Each record must be loaded into ram, modified and then written.

    Yesterday I removed a large field from my ES cluster. I saw sustained throughput of 10,000 records per second during the update_by_query, constrained by CPU rather than IO.

    Look into setting conflict=proceed if the cluster has other update traffic, or the whole job will stop when it hits a ConflictError when one of the records is updated underneath one of the batches.

    Similarly setting wait_for_completion=false will cause the update_by_query to run via the tasks interface. Otherwise the job will terminate if the connection is closed.

    url:

    http://localhost:9200/INDEX/TYPE/_update_by_query?wait_for_completion=false&conflicts=proceed
    

    POST body:

    {
      "script": "ctx._source.remove('name_of_field')",
      "query": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "name_of_field"
              }
            }
          ]
        }
      }
    }
    

    As of Elasticsearch 1.43, inline groovy scripting is disabled by default. You'll need to enable it for an inline script like this to work by adding script.inline: true to your config file.

    Or upload the groovy as a script and use the "script": { "file": "scriptname", "lang": "groovy"} format.

提交回复
热议问题