Spark SQL SaveMode.Overwrite, getting java.io.FileNotFoundException and requiring 'REFRESH TABLE tableName'

后端 未结 4 732
孤独总比滥情好
孤独总比滥情好 2020-12-08 11:37

For spark sql, how should we fetch data from one folder in HDFS, do some modifications, and save the updated data to the same folder in HDFS via Overwrite save mode<

4条回答
  •  猫巷女王i
    2020-12-08 12:07

    val dfOut = df.filter(r => r.getAs[Long]("dsctimestamp") > (System.currentTimeMillis() - 1800000))
    

    In the above line of code, df had an underlying Hadoop partition. Once I had made this transformation (i.e., to dfOut), I could not find a way to delete, rename, or overwrite the underlying partition until dfOut had been garbage collected.

    My solution was to keep the old partition, create a new partition for dfOut, flag the new partition as current and then delete the old partition some given time later, after dfOut had been garbage collected.

    May not be an ideal solution. I would love to learn a less tortuous way of addressing this issue. But it works.

提交回复
热议问题