Elasticsearch exact matches on analyzed fields

后端 未结 3 891
清酒与你
清酒与你 2020-12-06 04:28

Is there a way to have ElasticSearch identify exact matches on analyzed fields? Ideally, I would like to lowercase, tokenize, stem and perhaps even phoneticize my docs, then

3条回答
  •  既然无缘
    2020-12-06 05:03

    You can use multi-fields for that purpose and have a not_analyzed sub-field within your analyzed field (let's call it item in this example). Your mapping would have to look like this:

    {
      "yourtype": {
        "properties": {
          "item": {
            "type": "string",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed"
              }
            }
          }
        }
      }
    }
    

    With this kind of mapping, you can check how each of the values Hamburgers and Hamburger Buns are "viewed" by the analyzer with respect to your multi-field item and item.raw

    For Hamburger:

    curl -XGET 'localhost:9200/yourtypes/_analyze?field=item&pretty' -d 'Hamburger'
    {
      "tokens" : [ {
        "token" : "hamburger",
        "start_offset" : 0,
        "end_offset" : 10,
        "type" : "",
        "position" : 1
      } ]
    }
    curl -XGET 'localhost:9200/yourtypes/_analyze?field=item.raw&pretty' -d 'Hamburger'
    {
      "tokens" : [ {
        "token" : "Hamburger",
        "start_offset" : 0,
        "end_offset" : 10,
        "type" : "word",
        "position" : 1
      } ]
    }
    

    For Hamburger Buns:

    curl -XGET 'localhost:9200/yourtypes/_analyze?field=item&pretty' -d 'Hamburger Buns'
    {
      "tokens" : [ {
        "token" : "hamburger",
        "start_offset" : 0,
        "end_offset" : 10,
        "type" : "",
        "position" : 1
      }, {
        "token" : "buns",
        "start_offset" : 11,
        "end_offset" : 15,
        "type" : "",
        "position" : 2
      } ]
    }
    curl -XGET 'localhost:9200/yourtypes/_analyze?field=item.raw&pretty' -d 'Hamburger Buns'
    {
      "tokens" : [ {
        "token" : "Hamburger Buns",
        "start_offset" : 0,
        "end_offset" : 15,
        "type" : "word",
        "position" : 1
      } ]
    }
    

    As you can see, the not_analyzed field is going to be indexed untouched exactly as it was input.

    Now, let's index two sample documents to illustrate this:

    curl -XPOST localhost:9200/yourtypes/_bulk -d '
    {"index": {"_type": "yourtype", "_id": 1}}
    {"item": "Hamburger"}
    {"index": {"_type": "yourtype", "_id": 2}}
    {"item": "Hamburger Buns"}
    '
    

    And finally, to answer your question, if you want to have an exact match on Hamburger, you can search within your sub-field item.raw like this (note that the case has to match, too):

    curl -XPOST localhost:9200/yourtypes/yourtype/_search -d '{
      "query": {
        "term": {
          "item.raw": "Hamburger"
        }
      }
    }'
    

    And you'll get:

    {
      ...
      "hits" : {
        "total" : 1,
        "max_score" : 0.30685282,
        "hits" : [ {
          "_index" : "yourtypes",
          "_type" : "yourtype",
          "_id" : "1",
          "_score" : 0.30685282,
          "_source":{"item": "Hamburger"}
        } ]
      }
    }
    

    UPDATE (see comments/discussion below and question re-edit)

    Taking your example from the comments and trying to have HaMbUrGeR BuNs match Hamburger buns you could simply achieve it with a match query like this.

    curl -XPOST localhost:9200/yourtypes/yourtype/_search?pretty -d '{
      "query": {
        "match": {
          "item": {
            "query": "HaMbUrGeR BuNs",
            "operator": "and"
          }
        }
      }
    }'
    

    Which based on the same two indexed documents above will yield

    {
      ...
      "hits" : {
        "total" : 1,
        "max_score" : 0.2712221,
        "hits" : [ {
          "_index" : "yourtypes",
          "_type" : "yourtype",
          "_id" : "2",
          "_score" : 0.2712221,
          "_source":{"item": "Hamburger Buns"}
        } ]
      }
    }
    

提交回复
热议问题