Consistent hashing as a way to scale writes

淺唱寂寞╮ 提交于 2019-12-04 09:55:53

There are two reasons to use multiple nodes in a cluster:

  • Sharding to limit the amount of data stored on each node
  • Duplication to reduce read load and allow a node to be removed without data loss.

The two are fundamentally different, but you can implement both - use consistent hashing to point to a set of nodes with a standard master/slave setup rather than a single node.

If the cluster is your primary data store rather than a cache, you will need a different redistribution strategy that includes copying the data.

My implementation is based on having the client choose one of 64k buckets for a hash and having a table that maps that bucket to a node. Initially, all map to node #1.

When node #1 gets too large, its slave becomes master node #2 and the table is updated to map half of the node #1 keys to node #2. At this point all reads and writes will work with the new mapping and you just need to clean up the keys that are now on the wrong node. Depending on the performance requirements, you can check all keys at once or check a random selection of keys as the expiry system does.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!