Spark: Saving RDD in an already existing path in HDFS

微笑、不失礼 提交于 2019-12-02 10:17:37

问题


I am able to save the RDD output to HDFS with saveAsTextFile method. This method throws an exception if the file path already exists.

I have a use case where I need to save the RDDS in an already existing file path in HDFS. Is there a way to do just append the new RDD data to the data that is already existing in the same path?


回答1:


One possible solution, available since Spark 1.6, is to use DataFrames with text format and append mode:

val outputPath: String = ???

rdd.map(_.toString).toDF.write.mode("append").text(outputPath)


来源:https://stackoverflow.com/questions/38663536/spark-saving-rdd-in-an-already-existing-path-in-hdfs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!