What is the best way to index data on elasticsearch?

我怕爱的太早我们不能终老 提交于 2019-12-25 01:18:47

问题


I have 4 tables:

  1. country
  2. state
  3. city
  4. address

These tables are related by ids where country is the top parent:

  • state.countryId
  • city.stateId
  • address.cityId

I want to integrate elastic search on my application and want to know what is the best way to index these table?

Should i create 1 index for each tables so that i have 1 index for each of country, state, city and address?

Or should i denormalize the tables and create only 1 index and store all the data with redundancy?


回答1:


ES is not afraid of redundancy in your data, so I would clearly denormalize so that each document represents one address like this:

{
    "country_id": 1,
    "country_name": "United Stated of America",
    "state_id": 1,
    "state_name": "California"
    "state_code": "CA",
    "city_id": 1,
    "city_name": "San Mateo"
    "zip_code": 94402,
    "address": "400 N El Camino Real"
}

You can then aggregate your data on whatever city, state, country field you wish.

Your mileage may vary as it ultimately depends on how you want to query/aggregate your data, but it's much easier to query address data like this in a single index instead of hitting several indices.




回答2:


I like Val's answer, it is the most straight forward option. But if you really want to reduce duplication (for example to minimize size on disk) you could use parent-child mapping. It will make indexing and querying a bit more verbose though. I still sugges to go with "flat" mapping.

You asked "what if you need the individual country or state or city records?", I'd recommend to add an additional field (not_analyzed or integer) which would indicate which level of hierarchy this document represents. It is fine not to have fields which correspond to lower levels of hierarchy. This way you could easily have a filter on just searching states or countries.




回答3:


Here is a very useful article by @adrien-grand which elaborates on the subject of the trade-offs between creating many indexes, or less indexes and many types.

Hope it helps!



来源:https://stackoverflow.com/questions/37160625/what-is-the-best-way-to-index-data-on-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!