Scala Spark - overwrite parquet file failed to delete file or dir

瘦欲@ 提交于 2019-12-08 10:30:28

问题


I'm trying to create parquet files for several days locally. The first time I run the code, everything works fine. The second time it fails to delete a file. The third time it fails to delete another file. It's totally random which file can not be deleted.

The reason I need this to work is because I want to create parquet files everyday for the last seven days. So the parquet files that are already there should be overwritten with the updated data.

I use Project SDK 1.8, Scala version 2.11.8 and Spark version 2.0.2.

After running that line of code the second time:

newDF.repartition(1).write.mode(SaveMode.Overwrite).parquet(
    OutputFilePath + "/day=" + DateOfData)

this error occurs:

WARN FileUtil: 
Failed to delete file or dir [C:\Users\...\day=2018-07-15\._SUCCESS.crc]: 
it still exists.
Exception in thread "main" java.io.IOException: 
Unable to clear output directory file:/C:/Users/.../day=2018-07-15 
prior to writing to it
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:91)

After the third time:

WARN FileUtil: Failed to delete file or dir 
[C:\Users\day=2018-07-20\part-r-00000-8d1a2bde-c39a-47b2-81bb-decdef8ea2f9.snappy.parquet]: it still exists.
Exception in thread "main" java.io.IOException: Unable to clear output directory file:/C:/Users/day=2018-07-20 prior to writing to it
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:91)

As you see it's another file than the second time running the code. And so on.. After deleting the files manually all parquet files can be created.

Does somebody know that issue and how to fix it?

Edit: It's always a crc-file that can't be deleted.


回答1:


Thanks for your answers. :) The solution is not to write in the Users directory. There seems to be a permission problem. So I created a new folder in the C: directory and it works perfect.




回答2:


Perhaps another Windows process has a lock on the file so it can't be deleted.



来源:https://stackoverflow.com/questions/51561061/scala-spark-overwrite-parquet-file-failed-to-delete-file-or-dir

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!