I am receiving the streaming data myDStream
(DStream[String]
) that I want to save in S3 (basically, for this question, it doesn\'t matter where exa
As an alternative to rdd.collect.mkString("\n")
you can use hadoop Filesystem library to cleanup output by moving part-00000
file into it's place. Below code works perfectly on local filesystem and HDFS, but I'm unable to test it with S3:
val outputPath = "path/to/some/file.json"
rdd.saveAsTextFile(outputPath + "-tmp")
import org.apache.hadoop.fs.Path
val fs = org.apache.hadoop.fs.FileSystem.get(spark.sparkContext.hadoopConfiguration)
fs.rename(new Path(outputPath + "-tmp/part-00000"), new Path(outputPath))
fs.delete(new Path(outputPath + "-tmp"), true)