Spark streaming tab disappears after restarting from checkpoint

喜欢而已 提交于 2019-12-10 10:59:00

问题


I have a Spark Streaming job running on a cluster (Spark 1.6) which checkpoints to S3. When I start up the job initially, I can see "Streaming" tab. However when I restart the job from checkpoint the Streaming tab disappears. The job still works as a streaming job and I see the batches appear at the configured batch interval. See below.

If I clear out the checkpoint data, the tab comes back. I suspect that the Streaming tab is not registered correctly while restarting from a checkpoint.

I looked at the Spark Streaming code. Is it possible this flow is not invoked when the application state is deserialised from a checkpoint?

Does anyone know how to fix this?


回答1:


If I clear out the checkpoint data, the tab comes back. I suspect that the Streaming tab is not registered correctly while restarting from a checkpoint.

It is invoked, but the streaming tab doesn't appear until it finishes loading all the data from the S3 checkpoint location. If your lineage is long, it may take some time to load. Once all the data is restored from checkpoint, you'll see the streaming tab appear.



来源:https://stackoverflow.com/questions/36692827/spark-streaming-tab-disappears-after-restarting-from-checkpoint

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!