Elasticsearch: is there a way to declare for all (possibly dynamic) subfields of an object field as string?

孤者浪人 提交于 2019-12-30 11:25:10

问题


I have a doc_type with a mapping similar to this very simplified one:

{
   "test":{
      "properties":{
         "name":{
            "type":"string"
         },
         "long_searchable_text":{
            "type":"string"
         },
         "clearances":{
            "type":"object"
         }
      }
   }
}

The field clearances should be an object, with a series of alphanumeric identifiers for filtering purposes. A typical document will have this format:

{
    "name": "Lord Macbeth",
    "long_searchable_text": "Life's but a walking shadow, a poor player, that..."
    "clearances": {
        "glamis": "aa2862jsgd",
        "cawdor": "3463463551"
    }
}

The problem is that sometimes during indexing, the first indexed content of a new field inside the object field clearances will be completely numerical, as in the case above. This causes Elasticsearch to infer the type of this field as long. But this is an accident. The field might be alphanumeric in another document. When a latter document containing an alphanumeric value in this field arrive, I get a parsing exception:

{"error":"MapperParsingException[failed to parse [clearances.cawdor]]; nested: NumberFormatException[For input string: \"af654hgss1\"]; ","status":400}% 

I tried to solve this with a dynamic template defined like this:

{
   "test":{
      "properties":{
         "name":{
            "type":"string"
         },
         "long_searchable_text":{
            "type":"string"
         },
         "clearances":{
            "type":"object"
         }
      }
   },
   "dynamic_templates":[
      {
         "source_template":{
            "match":"clearances.*",
            "mapping":{
               "type":"string",
               "index":"not_analyzed"
            }
         }
      }
   ]
}

But it keeps happening that if the first indexed document have a clearance.some_subfield value that can be parsed as an integer, it would be inferred as an integer and all subsequent documents that have alphanumeric values on that subfield will fail to be indexed.

I could list all current subfields in the the mapping, but they are many and I expect their number to grow in the future (triggering an update of the mapping and a need for a full reindexation...).

Is there a way to make this work without resorting to this full reindexation everytime a new subfield is added?


回答1:


You're almost there.

First, your dynamic mapping's path must be on clearances.*, and it must be a path_match and not a plain match.

Here's a runnable example: https://www.found.no/play/gist/df030f005da71827ca96

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {},
    "mappings": {
        "test": {
            "dynamic_templates": [
                {
                    "clearances_as_string": {
                        "path_match": "clearances.*",
                        "mapping": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            ]
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"test"}}
{"clearances":{"glamis":1234,"cawdor":5678}}
{"index":{"_index":"play","_type":"test"}}
{"clearances":{"glamis":"aa2862jsgd","cawdor":"some string"}}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "facets": {
        "cawdor": {
            "terms": {
                "field": "clearances.cawdor"
            }
        }
    }
}
'


来源:https://stackoverflow.com/questions/20401709/elasticsearch-is-there-a-way-to-declare-for-all-possibly-dynamic-subfields-of

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!