问题
I'm sending this request
curl -XGET 'host/process_test_3/14/_search' -d '{
"query" : {
"query_string" : {
"query" : "\"*cor interface*\"",
"fields" : ["title", "obj_id"]
}
}
}'
And I'm getting correct result
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 5.421598,
"hits": [
{
"_index": "process_test_3",
"_type": "14",
"_id": "141_dashboard_14",
"_score": 5.421598,
"_source": {
"obj_type": "dashboard",
"obj_id": "141",
"title": "Cor Interface Monitoring"
}
}
]
}
}
But when I want to search by word part, as example
curl -XGET 'host/process_test_3/14/_search' -d '
{
"query" : {
"query_string" : {
"query" : "\"*cor inter*\"",
"fields" : ["title", "obj_id"]
}
}
}'
I'm getting no results back:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : []
}
}
What am I doing wrong?
回答1:
This is because your title
field has probably been analyzed by the standard analyzer (default setting) and the title Cor Interface Monitoring
has been tokenized as the three tokens cor
, interface
and monitoring
.
In order to search any substring of words, you need to create a custom analyzer which leverages the ngram token filter in order to also index all substrings of each of your tokens.
You can create your index like this:
curl -XPUT localhost:9200/process_test_3 -d '{
"settings": {
"analysis": {
"analyzer": {
"substring_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "substring"]
}
},
"filter": {
"substring": {
"type": "nGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"14": {
"properties": {
"title": {
"type": "string",
"analyzer": "substring_analyzer"
}
}
}
}
}'
Then you can reindex your data. What this will do is that the title Cor Interface Monitoring
will now be tokenized as:
co
,cor
,or
in
,int
,inte
,inter
,interf
, etcmo
,mon
,moni
, etc
so that your second search query will now return the document you expect because the tokens cor
and inter
will now match.
回答2:
+1 to Val's solution.
Just wanted to add something.
Since your query is relatively simple, you may want to have a look at match
/match_phrase
queries. Match queries does have the regex parsing like query_string and are thus lighter.
You can find the details here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
来源:https://stackoverflow.com/questions/34331249/elasticsearch-query-string-dont-search-by-word-part