Add a header before text file on save in Spark

前端 未结 5 798
感动是毒
感动是毒 2020-12-18 22:35

I have some spark code to process a csv file. It does some transformation on it. I now want to save this RDD as a csv file and add a header. Each line of this RDD is already

5条回答
  •  渐次进展
    2020-12-18 22:47

    def addHeaderToRdd(sparkCtx: SparkContext, lines: RDD[String], header: String): RDD[String] = {
    
        val headerRDD = sparkCtx.parallelize(List((-1L, header)))     // We index the header with -1, so that the sort will put it on top.
    
        val pairRDD = lines.zipWithIndex()
    
        val pairRDD2 = pairRDD.map(t => (t._2, t._1))
    
        val allRDD = pairRDD2.union(headerRDD)
    
        val allSortedRDD = allRDD.sortByKey()
    
        return allSortedRDD.values
    }
    

提交回复
热议问题