问题
my mapping looks like as below. As you can see that Name field is analyzed
{
"state":"open",
"settings":{
"index":{
"creation_date":"1453816191454",
"number_of_shards":"5",
"number_of_replicas":"1",
"version":{
"created":"1070199"
},
}
},
"mappings":{
"Product":{
"properties":{
"index":"not_analyzed",
"store":true,
"type":"string"
},
"Name":{
"store":true,
"type":"string"
},
"Number":{
"index":"not_analyzed",
"store":true,
"type":"string"
},
"id":{
"index":"no",
"store":true,
"type":"integer"
}
}
},
"aliases":[
]
}
}
When I query as below
"query": {
"match_phrase": {
"Name": "hl-2240"
}
}
This works fine and also "hl 2240" works fine but when I type "hl2240". I dont get any results. I understand that it is because name is indexed as "hl-2240" and I guess that I am using standart or generic analyzer and it tokenizes as hl and 2240. while I dont have any token as hl2240 in reverse index, It doesnt find anything. I learnt that I should use another analyzer. But this is where I am stuck at. Which analyzer I can use? Should I reindex my index or I can use the Analyzer only to query? If I change the analyzer to index my data, I want to make sure that I am not loosing results for searching "hl-2240" or "hl 2240".
Update: Nest query I tried for Richa's answer.
Client.CreateIndex("myIndex",
ci => ci.Analysis(a => a.TokenFilters(f => f.Add("my_word_delimiter", new WordDelimiterTokenFilter
{
CatenateAll = true
}))
.Analyzers(an => an.Add("my_analyzer", new CustomAnalyzer
{
Tokenizer = "whitespace",
Filter = new List<string> {"standard",
"lowercase",
"my_word_delimiter"}
}))));
回答1:
Try to use this analyzer:
{
"settings": {
"analysis": {
"filter": {
"my_word_delimiter": {
"type": "word_delimiter",
"catenate_all": true <=== Notice this
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"standard",
"lowercase",
"my_word_delimiter"
]
}
}
}
}
}
Read about catenate_all.
Use following command to see how string is being tokenized:
curl -XGET "localhost:9200/index_8/_analyze?analyzer=my_analyzer&pretty=true" -d 'hl-2240'
THis will produce following output and hl-2240 will be indexed as
{
"tokens" : [ {
"token" : "hl",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
}, {
"token" : "hl2240",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
}, {
"token" : "2240",
"start_offset" : 3,
"end_offset" : 7,
"type" : "word",
"position" : 1
} ]
}
Hope it helps you
来源:https://stackoverflow.com/questions/35697941/which-analyzer-to-get-results-of-appended-words-when-data-has-space-or-dash-in-b