Dynamic Template not working for short, byte & float

隐身守侯 提交于 2019-12-24 18:36:26

问题


I am trying to create a template, in my template I am trying to achieve the dynamic mapping.

Here is what I wrote, as in 6.2.1 the only boolean, date, double, long, object, string are automatically detected, facing issues for mapping the float, short & byte.

Here if I index 127, it will be mapped to short from the short_fields, it's fine, but when I index some 325566, I am getting exception Numeric value (325566) out of range of Java short, I want to suppress this and let long_fields, should take care about this & it should be mapped to long. I have tried with coerce:false, ignore_malformed:true, none of them worked as expected.

"dynamic_templates": [
  {
    "short_fields": {
      "match": "*",
      "match_mapping_type": "long",
      "mapping": {
        "type": "short",
        "doc_values": true
      }
    }
  },
  {
    "long_fields": {
      "match": "*",
      "match_mapping_type": "long",
      "mapping": {
        "type": "long",
        "doc_values": true
      }
    }
  },
  {
    "byte_fields": {
      "match": "*",
      "match_mapping_type": "byte",
      "mapping": {
        "type": "byte",
        "doc_values": true
      }
    }
  }
]

回答1:


Unfortunately, it is not possible to make Elasticsearch choose the smallest data type possible for you. There are plenty of workarounds, but let me first explain why it does not work.

Why it does not work?

Dynamic mapping templates allow to override default dynamic type matching in three ways:

  • by matching the name of the field,
  • by matching the type Elasticsearch have guessed for you,
  • and by a path in the document.

Elasticsearch picks the first matching rule that works. In your case, the first rule, short_fields, always works for any integer, because it accepts any field name and a guessed type long.

That's why it works for 127 but doesn't work for 325566.

To illustrate better this point, let's change "matching_mapping_type" in the first rule like this:

"match_mapping_type": "short",

Elasticsearch does not accept it and returns an error:

  {
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [doc]: No field type matched on [short], \
possible values are [object, string, long, double, boolean, date, binary]"
  }

But how can we make Elasticsearch pick the right types?

Here are some of the options.

Define strict mapping manually

This gives you full control over the selection of types.

Use the default long

Postpone "shrinking" data until it starts being a performance problem.

In fact, using smaller data types will only affect searching/indexing performance, not the storage required. As long as you are fine with dynamic mappings, Elasticsearch manages them for you pretty well.

Mark field names with type information

Since Elasticsearch is not able to tell a byte from long, you can determine the type beforehand and add type information in the field name, like customerAge_byte or revenue_long.

Then you will be able to use a prefix/suffix match like this:

    {
      "bytes_as_longs": {
        "match_mapping_type": "long",
        "match":   "*_byte",
        "mapping": {
          "type": "byte"
        }
      }
    }

Please choose the approach that fit your needs better.

Why Elasticsearch takes longs

The reason why Elasticsearch takes longs for any integer input is probably coming from the JSON definition of a number type (as shown at json.org):

It is not possible to tell if a number 0 or 1 is actually integer or long in the entire dataset. Elasticsearch has to guess the correct type from the first example shown, and it takes the safest shot possible.


Hope that helps!



来源:https://stackoverflow.com/questions/48982594/dynamic-template-not-working-for-short-byte-float

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!