Find documents with empty string value on elasticsearch

前端 未结 12 2122
不知归路
不知归路 2020-12-03 09:46

I\'ve been trying to filter with elasticsearch only those documents that contains an empty string in its body. So far I\'m having no luck.

Before I go on, I should

相关标签:
12条回答
  • 2020-12-03 10:11

    in order to find the empty string of one field in your document, it's highly relevant to the field's mapping, in other word, its index/analyzer setting .

    If its index is not_analyzed, which means the token is just the empty string, you can just use term query to find it, as follows:

    {"from": 0, "size": 100, "query":{"term": {"name":""}}}

    Otherwise, if the index setting is analyzed and I believe most analyzer will treat empty string as null value So you can use the filter to find the empty string.

    {"filter": {"missing": {"existence": true, "field": "name", "null_value": true}}, "query": {"match_all": {}}}

    here is the gist script you can reference: https://gist.github.com/hxuanji/35b982b86b3601cb5571

    BTW, I check the commands you provided, it seems you DON'T want the empty string document. And all my above command are just to find these, so just put it into must_not part of bool query would be fine. My ES is 1.0.1.


    For ES 1.3.0, currently the gist I provided cannot find the empty string. It seems it has been reported: https://github.com/elasticsearch/elasticsearch/issues/7348 . Let's wait and see how it go.

    Anyway, it also provides another command to find

    { "query": { "filtered": { "filter": { "not": { "filter": { "range": { "name": { } } } } } } } }

    name is the field name to find the empty-string. I've tested it on ES 1.3.2.

    0 讨论(0)
  • 2020-12-03 10:13

    Even with the default analyzer you can do this kind of search: use a script filter, which is slower but can handle the empty string:

    curl -XPOST 'http://localhost:9200/test/demo/_search' -d '
    {
     "query": {
       "filtered": {
         "filter": {
           "script": {
             "script": "_source._content.length() == 0"
           }
         }
       }
     }
    }'
    

    It will return the document with empty string as _content without a special mapping

    As pointed by @js_gandalf, this is deprecated for ES>5.0. Instead you should use: query->bool->filter->script as in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

    0 讨论(0)
  • 2020-12-03 10:14

    Found solution here https://github.com/elastic/elasticsearch/issues/7515 It works without reindex.

    PUT t/t/1
    {
      "textContent": ""
    }
    
    PUT t/t/2
    {
      "textContent": "foo"
    }
    
    GET t/t/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "textContent"
              }
            }
          ],
          "must_not": [
            {
              "wildcard": {
                "textContent": "*"
              }
            }
          ]
        }
      }
    }
    
    0 讨论(0)
  • 2020-12-03 10:20

    OR using lucene query string syntax

    q=yourfield.keyword:""

    See Elastic Search Reference https://www.elastic.co/guide/en/elasticsearch/reference/6.5/query-dsl-query-string-query.html#query-string-syntax

    0 讨论(0)
  • 2020-12-03 10:23

    I am trying to find the empty fields (in indexes with dynamic mapping) and set them to a default value and the below worked for me

    Note this is in elastic 7.x

    POST <index_name|pattern>/_update_by_query
    {
      "script": {
        "lang": "painless",
        "source": """
          if (ctx._source.<field name>== "") {
            ctx._source.<field_name>= "0";
          } else {
            ctx.op = "noop";
          }
        """
      }
    }
    

    I followed one of the responses from the thread and came up with below it will do the same

    GET index_pattern*/_update_by_query
    {
      "script": {
        "source": "ctx._source.field_name='0'",
        "lang": "painless"
      },
      "query": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "field_name"
              }
            }
          ],
          "must_not": [
            {
              "wildcard": {
                "field_name": "*"
              }
            }
          ]
        }
      }  
    }
    

    I am also trying to find the documents in the index that dont have the field and add them with a value

    one of the responses from this thread helped me to come up with below

    GET index_pattern*/_update_by_query
    {
      "script": {
        "source": "ctx._source.field_name='0'",
        "lang": "painless"
      },
      "query": {
        "bool": {
          "must_not": [
            {
              "exists": {
                "field": "field_name"
              }
            }
          ]
        }
      }
    }
    

    Thanks to every one who contributed to this thread I am able to solve my problem

    0 讨论(0)
  • 2020-12-03 10:24

    For those of you using elastic search 5.2 or above, and still stuck. Easiest way is to reindex your data correctly with the keyword type. Then all the searches for empty values worked. Like this:

    "query": {
        "term": {"MY_FIELD_TO_SEARCH": ""}
    }
    

    Actually, when I reindex my database and rerun the query. It worked =)

    The problem was that my field was type: text and NOT a keyword. Changed the index to keyword and reindexed:

    curl -X PUT https://username:password@host.io:9200/mycoolindex
    
    curl -X PUT https://user:pass@host.io:9200/mycoolindex/_mapping/mycooltype -d '{
      "properties": {
                "MY_FIELD_TO_SEARCH": {
                        "type": "keyword"
                    },
    }'
    
    curl -X PUT https://username:password@host.io:9200/_reindex -d '{
     "source": {
       "index": "oldindex"
     },
     "dest": {
        "index": "mycoolindex"
     }
    }'
    

    I hope this helps someone who was as stuck as I was finding those empty values.

    0 讨论(0)
提交回复
热议问题