Keep Solr slaves in sync

We have a master-slave setup running Solr 6.5.0. There is a backend process running 24/7 which pushes its data towards the master server. No commit is done on master. The web frontend is accessing the slave. Replication poll interval is 1 hour.

All is fine so far, but now as the traffic grows, the CPU load on slave is really high. I thought the best thing would be to add a second slave to the master and let the web servers connect via existing load balancers to the two Solr slave machines. I think that the two Solr slaves will handle their replication independently and each slave will poll the master at another time.

As the master receives 24/7 new data I'm worried that both machines do not have the same data set/version. Is there a solution with low administration effort to force both slaves polling new data from master at the same time? (I.e. I'm trying to avoid setting up a real Solr cluster as multiple slaves will fit our needs.)

The problem here is following, during your poll interval, potentially, your slaves could be out-of-sync. In your case you have 1 hour interval.

The thing which could be done with minimal effort is following, you could force replication on slaves at the same time by calling the command:

http://slave_host:port/solr/core_name/replication?command=fetchindex

However, I'm not sure how often you could call this command, since most likely you couldn't do it every minute or so.

Another possibility is to trigger replication whenever a commit is performed on the master index. You could do this by adding configuration:

<str name="replicateAfter">commit</str>

For more information about it take a look here

The traditional master-slave is basically doing rsync over http. So, maybe you can rsync between slaves (and reload cores after rsync).

来源：https://stackoverflow.com/questions/47771564/keep-solr-slaves-in-sync

标签

solr

replication