Understanding flink savepoints & checkpoints

↘锁芯ラ 提交于 2020-01-06 08:05:43

问题


Considering an Apache Flink streaming-application with a pipeline like this:

Kafka-Source -> flatMap 1 -> flatMap 2 -> flatMap 3 -> Kafka-Sink

where every flatMap function is a non-stateful operator (e.g. the normal .flatMap function of a Datastream).

How do checkpoints/savepoints work, in case an incoming message will be pending at flatMap 3? Will the message be reprocessed after restart beginning from flatMap 1 or will it skip to flatMap 3?

I am a bit confused, because the documentation seems to refer to application state as what I can use in stateful operators, but I don't have stateful operators in my application. Is the "processing progress" saved & restored at all, or will the whole pipeline be re-processed after a failure/restart?

And this there a difference between a failure (-> flink restores from checkpoint) and manual restart using savepoints regarding my previous questions?

I tried finding out myself (with enabled checkpointing using EXACTLY_ONCE and rocksdb-backend) by placing a Thread.sleep() in flatMap 3 and then cancelling the job with a savepoint. However this lead to the flink commandline tool hanging until the sleep was over, and even then flatMap 3 was executed and even sent out to the sink before the job got cancelled. So it seems I can not manually force this situation to analyze flink's behaviour.

In case "processing progress" is not saved/covered by the checkpointing/savepoints as I described above, how could I make sure for every message reaching my pipeline that any given operator (flatmap 1/2/3) is never re-processed in a restart/failure situation?


回答1:


When a checkpoint is taken, every task (parallel instance of an operator) checkpoints its state. In your example, the three flatmap operators are stateless, so there is no state to be checkpointed. The Kafka source is stateful and checkpoints the reading offsets for all partitions.

In case of a failure, the job is recovered and all tasks load their state which means in case of the source operator that the reading offsets are reset. Hence, the application will reprocess all events since the last checkpoint.

In order to achieve end-to-end exactly-once, you need a special sink connector that offers either transaction support (e.g., for Kafka) or supports idempotent writes.



来源:https://stackoverflow.com/questions/54986886/understanding-flink-savepoints-checkpoints

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!