elasticsearch query string dont search by word part

北慕城南 提交于 2019-12-17 13:42:52

问题


I'm sending this request

curl -XGET 'host/process_test_3/14/_search' -d '{
  "query" : {
    "query_string" : {
      "query" : "\"*cor interface*\"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

And I'm getting correct result

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 5.421598,
    "hits": [
      {
        "_index": "process_test_3",
        "_type": "14",
        "_id": "141_dashboard_14",
        "_score": 5.421598,
        "_source": {
          "obj_type": "dashboard",
          "obj_id": "141",
          "title": "Cor Interface Monitoring"
        }
      }
    ]
  }
}

But when I want to search by word part, as example

curl -XGET 'host/process_test_3/14/_search' -d '
{
  "query" : {
    "query_string" : {
      "query" : "\"*cor inter*\"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

I'm getting no results back:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : []
  }
}

What am I doing wrong?


回答1:


This is because your title field has probably been analyzed by the standard analyzer (default setting) and the title Cor Interface Monitoring has been tokenized as the three tokens cor, interface and monitoring.

In order to search any substring of words, you need to create a custom analyzer which leverages the ngram token filter in order to also index all substrings of each of your tokens.

You can create your index like this:

curl -XPUT localhost:9200/process_test_3 -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "substring_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "substring"]
        }
      },
      "filter": {
        "substring": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  },
  "mappings": {
    "14": {
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "substring_analyzer"
        }
      }
    }
  }
}'

Then you can reindex your data. What this will do is that the title Cor Interface Monitoring will now be tokenized as:

  • co, cor, or
  • in, int, inte, inter, interf, etc
  • mo, mon, moni, etc

so that your second search query will now return the document you expect because the tokens cor and inter will now match.




回答2:


+1 to Val's solution. Just wanted to add something. Since your query is relatively simple, you may want to have a look at match/match_phrase queries. Match queries does have the regex parsing like query_string and are thus lighter. You can find the details here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html



来源:https://stackoverflow.com/questions/34331249/elasticsearch-query-string-dont-search-by-word-part

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!