Ensuring ElasticSearch is in Sync with Database

问题

I'm considering a daily script to do the following, in order to account for any situations where there was a problem with updates on the ES server (I don't yet have a high-availability setup and even so, it's still probably a good practice in a situation where data is being duplicated between DB and ES). Before putting this script together, I thought I'd check if I'm going about this the right way, and whether there are any libraries or techniques I should use.

The script will simply retrieve all IDs from the database and all IDs from ElasticSearch, where created_at < current_time (a snapshot of the current time, since it's a moving target as the script runs). It will then add and remove to Elastic search based on the differences between these IDs sets.

Does this sound like a reasonable approach?

回答1:

To answer my question, this is not the best approach.

A simpler, if more resource-intensive, approach is to re-build the entire index periodically. Of course, this is difficult to do in production as it would cause minutes or hours of downtime, so the trick is to rebuild a new index and switch to using that. In ElasticSearch, you can't rename an index, but you can use aliases.

There's a discussion of the approach here and a rake task for Tire users here.

回答2:

Please have a look at jdbc-river plugin. This plugin is fairly stable and can be used to sync data between ES and database.

来源：https://stackoverflow.com/questions/11952558/ensuring-elasticsearch-is-in-sync-with-database

标签

ruby-on-rails

ruby

ElasticSearch

tire