How does one handled skipped event numbers with Paxos?

本小妞迷上赌 提交于 2019-12-07 22:27:56

问题


If we are running multi-paxos then a node may see:

Propose(N)
Accept!(N,Vn)
Accept!(N+1,Vm)
Accept!(N+4,Vo) // huh? where is +2, +3?
Accept!(N+5,Vp)

This may be because either:

  • There was a stable leader but the network local to this node dropped else delayed +2 and +3.
  • There was an outage such that there were two attempts to propose such that +2 and +3 were failed rounds proposals

In general operations on the distributed finite state machine wont commute such that a node should apply all operations in order. This implies that a node needs to be able to distinguish between the two cases. If it is failed rounds of proposals the node has no problems. If it is lost messages it suggests that the node should wait till they turn up else try to recover the lost data (e.g. request a snapshot to reinitialise and catchup).

What are the options or strategies to handle this and what overhead do they create?

This question is inspired by In Paxos, can an Acceptor accept a different value after it has already accepted one?


回答1:


I can think of two methods to deal with this.

The simplest approach would be to have the node that is missing +2 and +3 to go back and try to propose no-ops in those slots. If there were decisions there, the node will learn the data in the prepare round. Otherwise, no-ops will be decided.

Another approach would be to have an out-of-band re-learning process. This may be necessary anyway: how does a node catch up if it joins the system after the others?

Or you can use a combination of both. The leader can propose no-ops for any holes in its history, the others can use the re-learning process. This is how my paxos system works.



来源:https://stackoverflow.com/questions/26501211/how-does-one-handled-skipped-event-numbers-with-paxos

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!