How can I handle duplicate data in Elasticsearch?

问题

I have used parent & child mapping to normalize data but as far as I understand there is no way to get any fields from _parent document.

Here is the mapping of my index:

{
 "mappings": {
    "building": {
      "properties": {
        "name": {
          "type": "string"
        }
      }
    },
    "flat": {
      "_parent": {
        "type": "building"
      },
      "properties": {
        "name": {
          "type": "string"
        }
      }
    },
    "room": {
      "_parent": {
        "type": "flat"
      },
      "properties": {
        "name": {
          "type": "string"
        },
        "floor": {
          "type": "long"
        }
      }
    }
  }
}

Now, I'm trying to find the best way of storing flat_name and building_name in room type. I won't query these fields but I should be able to get them when I query other fields like floor.

There will be millions of rooms and I don't have much memory so I suspect that these duplicate values may cause out of memory. For now, flat_name and building_name fields are has "index": "no" property and I turned on compression for _source field.

Do you have any efficient suggestion for avoiding duplicate values like querying multiple queries or hacky way to get fields from _parent document or denormalized data is the only way to handle this kindle of problem?

来源：https://stackoverflow.com/questions/14060845/how-can-i-handle-duplicate-data-in-elasticsearch

标签

lucene

search-engine

ElasticSearch

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!