How can I make (Spark1.6) saveAsTextFile to append existing file?

前端 未结 3 430
难免孤独
难免孤独 2020-12-17 04:21

In SparkSQL,I use DF.wirte.mode(SaveMode.Append).json(xxxx),but this method get these files like

the filename is too complex and random,I can\'t use

3条回答
  •  甜味超标
    2020-12-17 05:00

    You can try this method which I find from somewhere. Process Spark Streaming rdd and store to single HDFS file

        import org.apache.hadoop.fs.{ FileSystem, FileUtil, Path }
    
    def saveAsTextFileAndMerge[T](hdfsServer: String, fileName: String, rdd: RDD[T]) = {
      val sourceFile = hdfsServer + "/tmp/"
      rdd.saveAsTextFile(sourceFile)
      val dstPath = hdfsServer + "/final/"
      merge(sourceFile, dstPath, fileName)
    }
    
    def merge(srcPath: String, dstPath: String, fileName: String): Unit = {
      val hadoopConfig = new Configuration()
      val hdfs = FileSystem.get(hadoopConfig)
      val destinationPath = new Path(dstPath)
      if (!hdfs.exists(destinationPath)) {
        hdfs.mkdirs(destinationPath)
      }
      FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath + "/" + fileName), false, hadoopConfig, null)
    }
    

提交回复
热议问题