For spark sql, how should we fetch data from one folder in HDFS, do some modifications, and save the updated data to the same folder in HDFS via Overwrite save mode<
Why don't you just cache it after reading it. Saving it to another file directory and then moving the directory might entail some extra permissions. I also have been forcing an action as well, like a show().
val myDF = spark.read.format("csv")
.option("header", "false")
.option("delimiter", ",")
.load("/directory/tofile/")
myDF.cache()
myDF.show(2)