Changing the default analyzer in ElasticSearch or LogStash

房东的猫 提交于 2019-12-23 09:15:02

问题


I've got data coming in from Logstash that's being analyzed in an overeager manner. Essentially, the field "OS X 10.8" would be broken into "OS", "X", and "10.8". I know I could just change the mapping and re-index for existing data, but how would I change the default analyzer (either in ElasticSearch or LogStash) to avoid this problem in future data?

Concrete Solution: I created a mapping for the type before I sent data to the new cluster for the first time.

Solution from IRC: Create an Index Template


回答1:


As you know, elasticsearch uses standard analyzer when no analyzer is specified explicitly. So while setting the templates, you can set your custom analyzer which is named as standard. And there you can set you own rules of setting analyzer, tokenzier, token filters.

Here are some helpful links that will help you understand better:

http://elasticsearch-users.115913.n3.nabble.com/How-we-can-change-Elasticsearch-default-analyzer-td4040411.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html




回答2:


According this page analyzers can be specified per-query, per-field or per-index.

At index time, Elasticsearch will look for an analyzer in this order:

  • The analyzer defined in the field mapping.
  • An analyzer named default in the index settings.
  • The standard analyzer.

At query time, there are a few more layers:

  • The analyzer defined in a full-text query.
  • The search_analyzer defined in the field mapping.
  • The analyzer defined in the field mapping.
  • An analyzer named default_search in the index settings.
  • An analyzer named default in the index settings.
  • The standard analyzer.

On the other hand, this page point to important thing:

An analyzer is registered under a logical name. It can then be referenced from mapping definitions or certain APIs. When none are defined, defaults are used. There is an option to define which analyzers will be used by default when none can be derived.

So the only way to define a custom analyzer as default is overriding one of pre-defined analyzers, in this case the default analyzer. it means we can not use an arbitrary name for our analyzer, it must be named default

here a simple example of index setting:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "char_filter": {
        "charMappings": {
          "type": "mapping",
          "mappings": [
            "\\u200C => "
          ]
        }
      },
      "filter": {
        "persian_stop": {
          "type": "stop",
          "stopwords_path": "stopwords.txt"
        }
      },
      "analyzer": {
        "default": {<--------- analyzer name must be default
          "tokenizer": "standard",
          "char_filter": [
            "charMappings"
          ],
          "filter": [
            "lowercase",
            "arabic_normalization",
            "persian_normalization",
            "persian_stop"
          ]
        }
      }
    }
  }
}


来源:https://stackoverflow.com/questions/19776784/changing-the-default-analyzer-in-elasticsearch-or-logstash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!