How to write Spark Streaming output to HDFS without overwriting

前端 未结 4 465
北荒
北荒 2020-12-20 00:48

After some processing I have a DStream[String , ArrayList[String]] , so when I am writing it to hdfs using saveAsTextFile and after every batch it overwrites the data , so h

4条回答
  •  青春惊慌失措
    2020-12-20 01:30

    Here I solve the issue without dataframe

    import java.time.format.DateTimeFormatter
    import java.time.LocalDateTime
    
     messages.foreachRDD{ rdd =>
        rdd.repartition(1)
        val eachRdd = rdd.map(record => record.value)
        if(!eachRdd.isEmpty) {
          eachRdd.saveAsTextFile(hdfs_storage + DateTimeFormatter.ofPattern("yyyyMMddHHmmss").format(LocalDateTime.now) + "/")
        }
      }
    

提交回复
热议问题