ElasticSearch - JavaApi searching not happening without (*) in my input query

五迷三道 提交于 2019-12-25 03:08:02

问题


Am fetching documents from elastic search using java api, i have the following code in my elastic search documents and am trying to search it with the following pattern.

code : MS-VMA1615-0D

Input : *VMA1615-0*     -- Am getting the results (MS-VMA1615-0D).
Input : MS-VMA1615-0D   -- Am getting the results (MS-VMA1615-0D).
Input : *VMA1615-0      -- Am getting the results (MS-VMA1615-0D).
Input : *VMA*-0*        -- Am getting the results (MS-VMA1615-0D).

But, if i give input like below, am not getting results.

Input : VMA1615         -- Am not getting the results.

Am expecting to return the code MS-VMA1615-0D

Please find my below java code that am using

private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX); 
    searchRequest.types(TYPE);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code); 

    qsQueryBuilder.defaultField("code");
    searchSourceBuilder.query(qsQueryBuilder);

    searchSourceBuilder.size(50);
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = null;
    try {
         searchResponse = SearchEngineClient.getInstance().search(searchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }
    Item item = null;
    SearchHit[] searchHits = searchResponse.getHits().getHits();

Please find my mapping details :

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "whitespace",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
       }
  }
 }
}

回答1:


To do what you're looking for you might have to change the tokenizer you're using. Currently you are using whitespace tokenizer which must be replaced with pattern tokenizer. So your new mapping should look like the below one:

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "pattern",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
    }
  }
 }
}

So after changing your mapping a query to VMA1615 will return MS-VMA1615-0D.

This works as it tokenize the string "MS-VMA1615-0D" into "MS", "VMA1615" & "0D". So, whenever in your query you have any of them it will give you the result.

POST _analyze
{
  "tokenizer": "pattern",
  "text": "MS-VMA1615-0D"
}

will return:

{
  "tokens": [
    {
      "token": "MS",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "VMA1615",
      "start_offset": 3,
      "end_offset": 10,
      "type": "word",
      "position": 1
    },
    {
      "token": "0D",
      "start_offset": 11,
      "end_offset": 13,
      "type": "word",
      "position": 2
    }
  ]
}

Based on your comment:

It is not how elasticsearch works. Elasticsearch stores the terms and their corresponding documents in an inverted index data structure and by default the terms produced by a full text search is based on white-spaces, i.e. a text "Hi there I am a technocrat" would split up as ["Hi", "there", "I", "am", "a", "technocrat"]. So this implies that the terms which gets stored depends on how it is tokenized. After indexing when you query let's say in the above example if I query for "technocrat", I will get the result as the inverted index has that term associated with my document. So in your case "VMA" is not stored as a term.

To do that use the below mapping:

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "my_pattern_tokenizer",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   },
   "tokenizer": {
     "my_pattern_tokenizer": {
          "type": "pattern",
          "pattern": "-|\\d"
        }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
    }
  }
 }
}

So to check:

POST products/_analyze
{
  "tokenizer": "my_pattern_tokenizer",
  "text": "MS-VMA1615-0D"
}

will produce:

{
  "tokens": [
    {
      "token": "MS",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "VMA",
      "start_offset": 3,
      "end_offset": 6,
      "type": "word",
      "position": 1
    },
    {
      "token": "D",
      "start_offset": 12,
      "end_offset": 13,
      "type": "word",
      "position": 2
    }
  ]
}


来源:https://stackoverflow.com/questions/51212683/elasticsearch-javaapi-searching-not-happening-without-in-my-input-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!