UTF8 encoding is longer than the max length 32766

前端 未结 10 984
鱼传尺愫
鱼传尺愫 2020-11-29 01:39

I\'ve upgraded my Elasticsearch cluster from 1.1 to 1.2 and I have errors when indexing a somewhat big string.

{
  \"error\": \"IllegalArgumentException[Docu         


        
10条回答
  •  醉话见心
    2020-11-29 02:28

    There is a better option than the one John posted. Because with that solution you can't search on the value anymore.

    Back to the problem:

    The problem is that by default field values will be used as a single term (complete string). If that term/string is longer than the 32766 bytes it can't be stored in Lucene .

    Older versions of Lucene only registers a warning when terms are too long (and ignore the value). Newer versions throws an Exception. See bugfix: https://issues.apache.org/jira/browse/LUCENE-5472

    Solution:

    The best option is to define a (custom) analyzer on the field with the long string value. The analyzer can split out the long string in smaller strings/terms. That will fix the problem of too long terms.

    Don't forget to also add an analyzer to the "_all" field if you are using that functionality.

    Analyzers can be tested with the REST api. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html

提交回复
热议问题