问题
I'm using incremental checkpoint with RocksDB and saving the checkpoints into a remote destination(S3 in my case). What will happen if someone deletes the job manager server (where the checkpoint coordinator operates) and reinstall it? By losing the checkpoint coordinator I also lose the option to recover the state from the checkpoints? because from what I know, the coordinator holds all the references of the checkpoints.
回答1:
If you run Flink with high availability enabled, then Flink will store pointers to its checkpoints in ZooKeeper. In case of a JobManager
failure, Flink will recover all checkpoints from ZooKeeper and be able to resume the jobs from the latest completed checkpoint.
来源:https://stackoverflow.com/questions/55613112/is-it-possible-to-recover-after-losing-the-checkpoint-coordinator