custom analyzer which breaks the tokens on special characters and lowercase/uppercase

送分小仙女□ 提交于 2019-12-04 06:07:54

问题


I am trying to write a custom analyzer which breaks the token on special characters and convert it into uppercase before indexing and I should be able to get result if I search with lowercase also..

for example if I am giving data@source - it should replace @ with whitespace - any special character it should replace with whitespace and give me result like data source.

Here is how I tried implementing.

PUT sound
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "uppercase"
            ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+)-(?=\\d)",
          "replacement": "$1 "
        }
      }
    }
  }
}


POST sound/_analyze
{
  "analyzer": "my_analyzer",
  "text": "data-source&abc"
}

It splits the tokens well , like -

{
   "tokens": [
      {
         "token": "DATA",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "SOURCE",
         "start_offset": 5,
         "end_offset": 11,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "ABC",
         "start_offset": 12,
         "end_offset": 15,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
} 

But if I search with lowercase or even uppercase in this, it is not working.. like:

GET sound/_search?text="data"

GET sound/_search?text="data"

GET /sound/_search
{
  "query": {
    "match": {
      "text": "data"
    }
  }
}

It is not giving me the result if I search like the above queries..


回答1:


You just need to use some slightly different syntax for your searches:

GET sound/_search?q=data

GET sound/_search?q=data

POST sound/_search
{
  "query": {
    "match": {
      "NAME_OF_YOUR_FIELD": "data"
    }
  }
}

NAME_OF_YOUR_FIELD needs to be the name of the field you are storing your data in. More infor on the match query here



来源:https://stackoverflow.com/questions/39643533/custom-analyzer-which-breaks-the-tokens-on-special-characters-and-lowercase-uppe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!