I am writing a storage writer for spark structured streaming which will partition the given dataframe and write to a different blob store account. The spark documentation sa
When you use foreachBatch, spark guarantee only that foreachBatch will call only one time. But if you will have exception during execution foreachBatch, spark will try to call it again for same batch. In this case we can have duplication if we store to multiple storages and have exception during storing. So you can manually handle exception during storing for avoid duplication.
In my practice I created custom sink if need to store to multiple storage and use datasource api v2 which support commit.