Elasticsearch count terms ignoring spaces

牧云@^-^@ 提交于 2019-12-04 09:42:39

问题


Using ES 1.2.1

My aggregation

{
    "size": 0,
    "aggs": {
        "cities": {
            "terms": {
                "field": "city","size": 300000
            }
     }
 }

}

The issue is that some city names have spaces in them and aggregate separately.

For instance Los Angeles

{
    "key": "Los",
    "doc_count": 2230
},
{
    "key": "Angeles",
    "doc_count": 2230
},

I assume it has to do with the analyzer? Which one would I use to not split on spaces?


回答1:


For fields that you want to perform aggregations on I would recommend either the keyword analyzer or do not analyze the field at all. From the keyword analyzer documentation:

An analyzer of type keyword that "tokenizes" an entire stream as a single token. This is useful for data like zip codes, ids and so on. Note, when using mapping definitions, it might make more sense to simply mark the field as not_analyzed.

However if you want to still perform analysis on the field to include for other searches, then consider using the field setting of ES 1.x As described in the field/multi_field documentation. This will allow you to have a value of the field for searching and one for aggregations.




回答2:


There are 2 approaches to solve this.

  1. The not_analyzed way - But this wont consider different capital and small cases
  2. The keyword tokenizer way - Here we can map different terms with different case as one.

These two concepts with working code examples are illustrated in this blog.



来源:https://stackoverflow.com/questions/24189381/elasticsearch-count-terms-ignoring-spaces

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!