I have partitioned data in the HDFS. At some point I decide to update it. The algorithm is:
In the end I just decided to delete that "green" subset of partitions from HDFS, and use SaveMode.Append instead. I think this is a bug in spark.
SaveMode.Append