not_analyzed field with doc_values still in fielddata cache

孤者浪人 提交于 2020-01-01 19:09:08

问题


During some experiment with fielddata vs doc_values, I encountered a weird case. In my earlier mapping, I didn't use doc values at all. In my new mapping, I've added doc_values: true to all fields in my mapping, except analyzed string fields and booleans (not supported until 2.0).

So in details, here is how I proceeded:

Before reindexing all my data, I restarted my ES 1.7 cluster fresh and ran a query with sorting, aggregations and script fields to "warm up" the fielddata cache. Then I queried the /fielddata endpoint to have an idea of the fielddata cache usage. It looked something like this:

curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'

id      host   ip            node  total  items.desc.raw more_fields...
rKX7... myhost 192.168.1.100 Doom  32.9mb 2.3mb          ...

As you can see, the field items.desc.raw used 2.3mb of heap space. items is of type nested and contains a string multi-field with a not_analyzed sub-field called raw. In short, the mapping of that nested field looks like this:

    "items": {
      "type": "nested",
      "properties": {
        "desc": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }

After adding doc_values: true to items.desc.raw, reindexing the whole index and running some aggregations, sorting and scripting again to warm up the fielddata cache, I queried the /fielddata endpoint again and here was the result:

curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'

id      host   ip            node  total  items.desc.raw some_bools...
tAB5... myhost 192.168.1.100 Yack  2.1mb  9.2kb          ...

So the fielddata usage has indeed been drastically lowered (which is good), the only fields I see are boolean fields (i.e. some_bools above) which was expected, but to my surprise, my nested not_analyzed string field also appeared, but with a much lower space usage.

What could be the cause of items.desc.raw still appearing in the fielddata cache?


回答1:


Somehow I forgot about global ordinals. They are the reason why I'm still getting fielddata usage even after using doc_values as global ordinals cannot be included in doc_values.

See more details here



来源:https://stackoverflow.com/questions/31623636/not-analyzed-field-with-doc-values-still-in-fielddata-cache

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!