Spark FileAlreadyExistsException on Stage Failure
问题 I am trying to write a dataframe to s3 location after re-partitioning. But whenever the write stage fails and Spark retry the stage it throws FileAlreadyExistsException. When I re-submit the job it works fine if spark completes the stage in one try. Below is my code block df.repartition(<some-value>).write.format("orc").option("compression", "zlib").mode("Overwrite").save(path) I believe Spark should remove files from the failed stage before retry. I understand this will be solved if we set