Which Analyzer to get results of appended words when data has space or dash in between?

会有一股神秘感。 提交于 2019-12-23 02:06:17

问题


my mapping looks like as below. As you can see that Name field is analyzed

     {
       "state":"open",
       "settings":{
          "index":{
             "creation_date":"1453816191454",
             "number_of_shards":"5",
             "number_of_replicas":"1",
             "version":{
                "created":"1070199"
             },           
          }
       },
       "mappings":{
          "Product":{
             "properties":{
                "index":"not_analyzed",
                "store":true,
                "type":"string"
             },
            "Name":{                  
                "store":true,
                "type":"string"
             },            
             "Number":{
                "index":"not_analyzed",
                "store":true,
                "type":"string"
             },
             "id":{
                "index":"no",
                "store":true,
                "type":"integer"
   }
         }
      },
      "aliases":[

      ]
   }
}

When I query as below

   "query": {
            "match_phrase": {
               "Name": "hl-2240"
            }
         }

This works fine and also "hl 2240" works fine but when I type "hl2240". I dont get any results. I understand that it is because name is indexed as "hl-2240" and I guess that I am using standart or generic analyzer and it tokenizes as hl and 2240. while I dont have any token as hl2240 in reverse index, It doesnt find anything. I learnt that I should use another analyzer. But this is where I am stuck at. Which analyzer I can use? Should I reindex my index or I can use the Analyzer only to query? If I change the analyzer to index my data, I want to make sure that I am not loosing results for searching "hl-2240" or "hl 2240".

Update: Nest query I tried for Richa's answer.

   Client.CreateIndex("myIndex",
            ci => ci.Analysis(a => a.TokenFilters(f => f.Add("my_word_delimiter", new WordDelimiterTokenFilter
            {
              CatenateAll = true          

            }))
            .Analyzers(an => an.Add("my_analyzer", new CustomAnalyzer
            {
              Tokenizer = "whitespace",
              Filter = new List<string> {"standard",
              "lowercase",
              "my_word_delimiter"}
            }))));

回答1:


Try to use this analyzer:

{
 "settings": {
  "analysis": {
       "filter": {
        "my_word_delimiter": {
           "type": "word_delimiter",
           "catenate_all": true     <=== Notice this

        }
     },
     "analyzer": {
        "my_analyzer": {
           "type": "custom",
           "tokenizer": "whitespace",
           "filter": [
              "standard",
              "lowercase",
              "my_word_delimiter"
           ]
        }
     }

    }
  }
}

Read about catenate_all.

Use following command to see how string is being tokenized:

 curl -XGET "localhost:9200/index_8/_analyze?analyzer=my_analyzer&pretty=true" -d 'hl-2240'

THis will produce following output and hl-2240 will be indexed as

{
"tokens" : [ {
 "token" : "hl",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
}, {
"token" : "hl2240",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
}, {
"token" : "2240",
"start_offset" : 3,
"end_offset" : 7,
"type" : "word",
"position" : 1
 }  ]
}

Hope it helps you



来源:https://stackoverflow.com/questions/35697941/which-analyzer-to-get-results-of-appended-words-when-data-has-space-or-dash-in-b

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!