问题
I am able to save the RDD output to HDFS with saveAsTextFile method. This method throws an exception if the file path already exists.
I have a use case where I need to save the RDDS in an already existing file path in HDFS. Is there a way to do just append the new RDD data to the data that is already existing in the same path?
回答1:
One possible solution, available since Spark 1.6, is to use DataFrames
with text
format and append
mode:
val outputPath: String = ???
rdd.map(_.toString).toDF.write.mode("append").text(outputPath)
来源:https://stackoverflow.com/questions/38663536/spark-saving-rdd-in-an-already-existing-path-in-hdfs