ElasticSearch + Kibana - Unique count using pre-computed hashes

我的未来我决定 提交于 2019-12-11 09:40:09

问题


update: Added

I want to perform unique count on my ElasticSearch cluster. The cluster contains about 50 millions of records.

I've tried the following methods:

First method

Mentioned in this section:

Pre-computing hashes is usually only useful on very large and/or high-cardinality fields as it saves CPU and memory.

Second method

Mentioned in this section:

Unless you configure Elasticsearch to use doc_values as the field data format, the use of aggregations and facets is very demanding on heap space.

My property mapping

"my_prop": {
  "index": "not_analyzed",
  "fielddata": {
    "format": "doc_values"
  },
  "doc_values": true,
  "type": "string",
  "fields": {
    "hash": {
      "type": "murmur3"
    }
  }
}

The problem

When I use unique count on my_prop.hash in Kibana I receive the following error:

Data too large, data for [my_prop.hash] would be larger than limit

ElasticSearch has 2g heap size. The above also fails for a single index with 4 millions of records.

My questions

  1. Am I missing something in my configurations?
  2. Should I increase my machine? This does not seem to be the scalable solution.

ElasticSearch query

Was generated by Kibana: http://pastebin.com/hf1yNLhE

ElasticSearch Stack trace

http://pastebin.com/BFTYUsVg


回答1:


That error says you don't have enough memory (more specifically, memory for fielddata) to store all the values from hash, so you need to take them out from the heap and put them on disk, meaning using doc_values.

Since you are already using doc_values for my_prop I suggest doing the same for my_prop.hash (and, no, the settings from the main field are not inherited by the sub-fields): "hash": { "type": "murmur3", "index" : "no", "doc_values" : true }.



来源:https://stackoverflow.com/questions/30766792/elasticsearch-kibana-unique-count-using-pre-computed-hashes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!