We\'re currently using Redis 2.8.4 and StackExchange.Redis (and loving it) but don\'t have any sort of protection against hardware failures etc at the moment. I\'m trying to
I was able to spend some time last week with the Linux guys testing scenarios and working on the C# side of this implementation and am using the following approach:
I find that generally I am working and reconfigured after about 5 seconds of losing the redis master. During this time I can't write but I can read (since you can read off a slave). 5 seconds is ok for us since our data updates very quickly and becomes stale after a few seconds (and is subsequently overwritten).
One thing I wasn't sure about was whether or not I should remove the redis server from the redis ConnectionMultiplexer when an instance goes down, or let it continue to retry the connection. I decided to leave it retrying as it comes back into the mix as a slave as soon as it comes back up. I did some performance testing with and without a connection being retried and it seemed to make little difference. Maybe someone can clarify whether this is the correct approach.
Every now and then bringing back an instance that was previously a master did seem to cause some confusion - a few seconds after it came back up I would receive an exception from writing - "READONLY" suggesting I can't write to a slave. This was rare but I found that my "catch-all" approach of calling Configure() 12 seconds after a connection state change caught this problem. Calling Configure() seems very cheap and therefore calling it twice regardless of whether or not it's necessary seemed OK.
Now that I have slaves I have offloaded some of my data cleanup code which does key scans to the slaves, which makes me happy.
All in all I'm pretty satisfied, it's not perfect but for something that should very rarely happen it's more than good enough.