I\'ve upgraded my Elasticsearch cluster from 1.1 to 1.2 and I have errors when indexing a somewhat big string.
{
\"error\": \"IllegalArgumentException[Docu
If you really want not_analyzed
on on the property because you want to do some exact filtering then you can use "ignore_above": 256
Here is an example of how I use it in php:
'mapping' => [
'type' => 'multi_field',
'path' => 'full',
'fields' => [
'{name}' => [
'type' => 'string',
'index' => 'analyzed',
'analyzer' => 'standard',
],
'raw' => [
'type' => 'string',
'index' => 'not_analyzed',
'ignore_above' => 256,
],
],
],
In your case you probably want to do as John Petrone told you and set "index": "no"
but for anyone else finding this question after, like me, searching on that Exception then your options are:
"index": "no"
"index": "analyze"
"index": "not_analyzed"
and "ignore_above": 256
It depends on if and how you want to filter on that property.
If you are using searchkick
, upgrade elasticsearch to >= 2.2.0
& make sure you are using searchkick 1.3.4
or later.
This version of searchkick sets ignore_above = 256
by default, thus you won't get this error when UTF > 32766.
This is discussed here.
I needed to change the index
part of the mapping to no
instead of not_analyzed
. That way the value is not indexed. It remains available in the returned document (from a search, a get, …) but I can't query it.
So you are running into an issue with the maximum size for a single term. When you set a field to not_analyzed it will treat it as one single term. The maximum size for a single term in the underlying Lucene index is 32766 bytes, which is I believe hard coded.
Your two primary options are to either change the type to binary or to continue to use string but set the index type to "no".
I've stumbled upon the same error message with Drupal's Search api attachments module:
Document contains at least one immense term in field="saa_saa_file_entity" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms.
Changing the fields type from string
to Fulltext
(in /admin/config/search/search-api/index/elastic_index/fields) solved the problem for me.
There is a better option than the one John posted. Because with that solution you can't search on the value anymore.
Back to the problem:
The problem is that by default field values will be used as a single term (complete string). If that term/string is longer than the 32766 bytes it can't be stored in Lucene .
Older versions of Lucene only registers a warning when terms are too long (and ignore the value). Newer versions throws an Exception. See bugfix: https://issues.apache.org/jira/browse/LUCENE-5472
Solution:
The best option is to define a (custom) analyzer on the field with the long string value. The analyzer can split out the long string in smaller strings/terms. That will fix the problem of too long terms.
Don't forget to also add an analyzer to the "_all" field if you are using that functionality.
Analyzers can be tested with the REST api. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html