How to write Spark Streaming output to HDFS without overwriting

前端 未结 4 445
北荒
北荒 2020-12-20 00:48

After some processing I have a DStream[String , ArrayList[String]] , so when I am writing it to hdfs using saveAsTextFile and after every batch it overwrites the data , so h

4条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-20 01:26

    Storing the streaming output to HDFS will always create a new files even in case when you use append with parquet which leads to a small files problems on Namenode. I may recommend to write your output to sequence files where you can keep appending to the same file.

提交回复
热议问题