Spark: Saving RDD in an already existing path in HDFS
问题 I am able to save the RDD output to HDFS with saveAsTextFile method. This method throws an exception if the file path already exists. I have a use case where I need to save the RDDS in an already existing file path in HDFS. Is there a way to do just append the new RDD data to the data that is already existing in the same path? 回答1: One possible solution, available since Spark 1.6, is to use DataFrames with text format and append mode: val outputPath: String = ??? rdd.map(_.toString).toDF