ElasticSearch - Statistical facet on length of string field

僤鯓⒐⒋嵵緔 提交于 2019-12-06 04:40:50

问题


I would like to retrieve data about a string field like the min, max and average length (by counting the number of characters inside the string). My issue is that aggregations can only be used for numeric fields. Besides, I tried it using a simple statistical facet,

 "query":{
      "match_all": {}
  }, 
 "facets":{
      "stat1":{
           "statistical":{
               "field":"title"}
               }
          } 

but I get shard failures and SearchPhaseExecutionException. When trying with a script field the error returned is an OutOfMemoryError:

  "query":{
       "match_all": {}
   }, 
  "script_fields":{
       "test1":{"script": "doc[\"title\"].value" }
   }

Is it possible to retrive such data about a simple "title" string field using CURL? Thank you!


回答1:


I haven't actually tried the following, but I believe it should work.

First some useful doc-references:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-statistical-facet.html.

In order to implement the statistical facet, the relevant field values are loaded into memory from the index. This means that per shard, there should be enough memory to contain them. Since by default, dynamic introduced types are long and double, one option to reduce the memory footprint is to explicitly set the types for the relevant fields to either short, integer, or float when possible.

I'm not sure directly how to set the type of the script-field to 'short' which is probably what you want. to reduce memory. it SHOULD be possible though.

ALSO: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-script-fields.html

It’s important to understand the difference between doc['my_field'].value and _source.my_field. The first, using the doc keyword, will cause the terms for that field to be loaded to memory (cached), which will result in faster execution, but more memory consumption. Also, the doc[...] notation only allows for simple valued fields (can’t return a json object from it) and make sense only on non-analyzed or single term based fields.

So ALTERNATIVE: would be to use _source instead of doc which would not cache the lengths.

Gives:

    {
        "query" : {
            "match_all" : {}
        },
        "facets" : {
            "stat1" : {
                "statistical" : {
                    "script" : "doc['title'].value.length()
                    //"script" : "_source.title.length() //ALTERNATIVE which isn't cached
                }
            }
        }
    }


来源:https://stackoverflow.com/questions/23023233/elasticsearch-statistical-facet-on-length-of-string-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!