AWS EMR Spark: Error writing to S3 - IllegalArgumentException - Cannot create a path from an empty string

前端 未结 1 1935
萌比男神i
萌比男神i 2021-01-16 00:20

I have been trying to fix this for a long time now ... no idea why I get this? FYI, I\'m running Spark on a cluster on AWS EMR Cluster. I debugged and clearly see the destin

相关标签:
1条回答
  • 2021-01-16 01:01

    I have seen a similar problem when writing parquet files to S3. The problem is the SaveMode.Overwrite. This mode doesn't seem to work correctly in combination with S3. Try to delete all the data in your S3 bucket my-bucket-name before writing into it. Then your code should run successfully.

    To delete all files from your bucket my-bucket-name you can use the following pyspark code:

    # see https://www.quora.com/How-do-you-overwrite-the-output-directory-when-using-PySpark
    URI = sc._gateway.jvm.java.net.URI
    Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
    FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
    
    # see http://crazyslate.com/how-to-rename-hadoop-files-using-wildcards-while-patterns/
    fs = FileSystem.get(URI("s3a://my-bucket-name"), sc._jsc.hadoopConfiguration())
    file_status = fs.globStatus(Path("/*"))
    for status in file_status:
        fs.delete(status.getPath(), True)
    
    0 讨论(0)
提交回复
热议问题