While writing to S3, why I get FileNotFoundException

别等时光非礼了梦想. 提交于 2020-03-05 00:22:42

问题


I'm using Spark-SQL-2.3.1, Kafka, Java 8 in my project, and would like to use AWS-S3 as savage storage.

I am writing/storing the consumed data from Kafka topic into S3 bucket as below:

   ds.writeStream()
     .format("parquet")
     .option("path", parquetFileName)
     .option("mergeSchema", true)
     .outputMode("append")
     .partitionBy("company_id")
     .option("checkpointLocation", checkPtLocation)
     .trigger(Trigger.ProcessingTime("25 seconds"))
     .start();

But while writing I am getting a FileNotFoundException

Caused by: java.io.FileNotFoundException: No such file or directory: s3a://company_id=216231245/part-00055-f4f87dc9-a620-41bd-9380-de4ba7e70efb.c000.snappy.parquet
  at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1931)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1822)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1763)

I wounder why I'm getting FileNotFoundException when writing? i am not reading from S3 right? So what is happening here and how fix this?


回答1:


This is because S3 is not a file system, but an object store. It does not support the semantics required for rename like HDFS. Spark first writes the output files to temporary folder and then rename them. There is no atomic way of doing this in S3. That's why at times, you will see these errors.

Now, to fix this, if your environment allows, you could use HDFS as an intermediate storage and move the files to S3 for later processing.

If you are on hadoop 3.1, you could use s3a committers shipped with it. More details on how to configure this can be found here

If you are on older version of hadoop, you could use an S3 output committer for Spark, which basically uses S3's multi-part upload to mimic this rename. One such committer I am aware of is this. Looks like this is not updated recently though. There may be other options too.



来源:https://stackoverflow.com/questions/60201672/while-writing-to-s3-why-i-get-filenotfoundexception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!