ElasticSearch searching with hyphen inside a word

落爺英雄遲暮 提交于 2020-01-14 09:18:18

问题


I would like to ask for a help. I want to search for a words inside the Title and Content. Here is the structure

'body' => array(
  'mappings' => array(
    'myindex' => array(
      '_source' => array(
        'enabled' => true
      ),
      'properties' => array(
        'Title' => array(
          'type'  => 'string',
          'fields'=> array(
            'raw' => array(
               'type'  => 'string',
               'index' => 'not_analyzed'
              )
            )
          ),
          'Content' => array(
            'type'  => 'string'
          ),
          'Image' => array(
             type'      => 'string',
             'analyzer'  => 'standard'
         )
       )
     )
   )
 )

And the query string looks like this, where I want so search for "15-g" inside a text like "15-game":

"query" : {
  "query_string": {
    "query": "*15-g*",
    "fields": [ "Title", "Content" ]
  }
}

Please accept my apologize if I duplicate the question but I cannot find out what's going on and why it does not return any results.

I've already had a look at:

ElasticSearch - Searching with hyphens

ElasticSearch - Searching with hyphens in name

ElasticSearch - Searching with hyphens in name

But I can't make to work that with me.

What is really interesting is that if I search for "15 - g" (15space-spaceg) it returns the result.

Thank you so much in advance!


回答1:


Add a .raw field to your Content as well and make the search on the .raw fields:

{
  "query": {
    "query_string": {
      "query": "*15-g*",
      "fields": [
        "Title.raw",
        "Content.raw"
      ]
    }
  }
}

Anywhere you have a space in the text you want to search and you want that space to match your fields, it needs to be escaped (with \). Also, anytime you have upper case letter and wildcards and you want to match like that with the .raw fields you need to set lowercase_expanded_terms to false, because by default that setting is true and it will lowercase the search string (it will search for laptop - black):

{
  "query": {
    "query_string": {
      "query": "*Laptop\\ -\\ Black*",
      "lowercase_expanded_terms": false, 
      "fields": [
        "Title.raw",
        "Content.raw"
      ]
    }
  }
}



回答2:


In elasticsearch 5, you can define custom analyzer with filter setting. Here is the example codes:

PUT test1
{
  "settings" : {
    "analysis" : {
      "analyzer" : {
        "myAnalyzer" : {
          "type" : "custom",
          "tokenizer" : "whitespace",
          "filter" : [ "dont_split_on_numerics" ]
        }
      },
      "filter" : {
        "dont_split_on_numerics" : {
          "type" : "word_delimiter",
          "preserve_original": true,
          "generate_number_parts" : false
        }
      }
    }
  },
  "mappings": {
    "type_one": {
      "properties": {
        "title": { 
          "type": "text",
          "analyzer": "standard"
        }
      }
    },
    "type_two": {
      "properties": {
        "raw": { 
          "type": "text",
          "analyzer": "myAnalyzer"
        }
      }
    }
  }
}

please know that I set the

"preserve_original": true "generate_number_parts"

So that the string "2-345-6789" will keep as it is. Dash is reserved word in elasticsearch. Without the above setting, standard tokenizer will generate "2", "345", and "6789". So, now you can use "wildcard" search ie.

"5-67"

to get the result.

POST test1/type_two/1
{
  "raw": "2-345-6789"
}

GET test1/type_two/_search
{
  "query": {
    "wildcard": {
      "raw": "*5-67*"
    }
  }
}

The detail information can be found at elastic search tokenfilter



来源:https://stackoverflow.com/questions/31301849/elasticsearch-searching-with-hyphen-inside-a-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!