Ensuring ElasticSearch is in Sync with Database

陌路散爱 提交于 2019-12-24 03:23:27

问题


I'm considering a daily script to do the following, in order to account for any situations where there was a problem with updates on the ES server (I don't yet have a high-availability setup and even so, it's still probably a good practice in a situation where data is being duplicated between DB and ES). Before putting this script together, I thought I'd check if I'm going about this the right way, and whether there are any libraries or techniques I should use.

The script will simply retrieve all IDs from the database and all IDs from ElasticSearch, where created_at < current_time (a snapshot of the current time, since it's a moving target as the script runs). It will then add and remove to Elastic search based on the differences between these IDs sets.

Does this sound like a reasonable approach?


回答1:


To answer my question, this is not the best approach.

A simpler, if more resource-intensive, approach is to re-build the entire index periodically. Of course, this is difficult to do in production as it would cause minutes or hours of downtime, so the trick is to rebuild a new index and switch to using that. In ElasticSearch, you can't rename an index, but you can use aliases.

There's a discussion of the approach here and a rake task for Tire users here.




回答2:


Please have a look at jdbc-river plugin. This plugin is fairly stable and can be used to sync data between ES and database.



来源:https://stackoverflow.com/questions/11952558/ensuring-elasticsearch-is-in-sync-with-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!