Gridgain failover of master (sender) node

问题

I am working on batch processing problem. Solution needs to handle failing hardware.

There is master node (which initiates tasks executions) and worker nodes which execute the jobs. I know how failover of worker nodes works but I could not find any information about failover of master nodes. Whenever master node which started a task fails the whole task is canceled.

Is there any way to finish task processing then?

Could you suggest the best way of implementing failover of master node?

Kind Regards, Kuba

回答1:

Whenever your master node dies, basically there is noone to perform the "reduce" step of your MapReduce task.

There are several ways you can try mitigating this problem:

Save intermediate checkpoints using GridCheckpointSpi (GridTaskSession.saveCheckpoint(..) API) and then when your task restarts after node crash, you can check if there is a checkpoint saved and start from it.
Do the same as in (1), but use the data grid instead (GridCache API).
If you don't care about "reduce", have your jobs ignore the "cancel" call and just have them save the results in data grid when they are done.

--Best

来源：https://stackoverflow.com/questions/5259899/gridgain-failover-of-master-sender-node

标签

MapReduce

parallel-processing

grid-computing

gridgain

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!