Redis failover with StackExchange / Sentinel from C#

后端 未结 4 1816
北恋
北恋 2020-12-24 08:58

We\'re currently using Redis 2.8.4 and StackExchange.Redis (and loving it) but don\'t have any sort of protection against hardware failures etc at the moment. I\'m trying to

4条回答
  •  情歌与酒
    2020-12-24 09:49

    I was able to spend some time last week with the Linux guys testing scenarios and working on the C# side of this implementation and am using the following approach:

    • Read the sentinel addresses from config and create a ConnectionMultiplexer to connect to them
    • Subscribe to the +switch-master channel
    • Ask each sentinel server in turn what they think the master redis and slaves are, compare them all to make sure they all agree
    • Create a new ConnectionMultiplexer with the redis server addresses read from sentinel and connect, add event handler to ConnectionFailed and ConnectionRestored.
    • When I receive the +switch-master message I call Configure() on the redis ConnectionMultiplexer
    • As a belt and braces approach I always call Configure() on the redis ConnectionMultiplexer 12 seconds after receiving a connectionFailed or connectionRestored event when the connection type is ConnectionType.Interactive.

    I find that generally I am working and reconfigured after about 5 seconds of losing the redis master. During this time I can't write but I can read (since you can read off a slave). 5 seconds is ok for us since our data updates very quickly and becomes stale after a few seconds (and is subsequently overwritten).

    One thing I wasn't sure about was whether or not I should remove the redis server from the redis ConnectionMultiplexer when an instance goes down, or let it continue to retry the connection. I decided to leave it retrying as it comes back into the mix as a slave as soon as it comes back up. I did some performance testing with and without a connection being retried and it seemed to make little difference. Maybe someone can clarify whether this is the correct approach.

    Every now and then bringing back an instance that was previously a master did seem to cause some confusion - a few seconds after it came back up I would receive an exception from writing - "READONLY" suggesting I can't write to a slave. This was rare but I found that my "catch-all" approach of calling Configure() 12 seconds after a connection state change caught this problem. Calling Configure() seems very cheap and therefore calling it twice regardless of whether or not it's necessary seemed OK.

    Now that I have slaves I have offloaded some of my data cleanup code which does key scans to the slaves, which makes me happy.

    All in all I'm pretty satisfied, it's not perfect but for something that should very rarely happen it's more than good enough.

提交回复
热议问题