How does Structured Streaming ensure exactly-once writing semantics for file sinks?

前端未结

关注

 2  455

情书的邮戳 2021-01-15 22:41

I am writing a storage writer for spark structured streaming which will partition the given dataframe and write to a different blob store account. The spark documentation sa

2条回答

情书的邮戳 (楼主)

2021-01-15 22:57

When you use foreachBatch, spark guarantee only that foreachBatch will call only one time. But if you will have exception during execution foreachBatch, spark will try to call it again for same batch. In this case we can have duplication if we store to multiple storages and have exception during storing. So you can manually handle exception during storing for avoid duplication.

In my practice I created custom sink if need to store to multiple storage and use datasource api v2 which support commit.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...