Add a header before text file on save in Spark

前端 未结 5 789
感动是毒
感动是毒 2020-12-18 22:35

I have some spark code to process a csv file. It does some transformation on it. I now want to save this RDD as a csv file and add a header. Each line of this RDD is already

5条回答
  •  情深已故
    2020-12-18 22:41

    You can make an RDD out of your header line and then union it, yes:

    val rdd: RDD[String] = ...
    val header: RDD[String] = sc.parallelize(Array("my,header,row"))
    header.union(rdd).saveAsTextFile(...)
    

    Then you end up with a bunch of part-xxxxx files that you merge.

    The problem is that I don't think you're guaranteed that the header will be the first partition and therefore end up in part-00000 and at the top of your file. In practice, I'm pretty sure it will.

    More reliable would be to use Hadoop commands like hdfs to merge the part-xxxxx files, and as part of the command, just throw in the header line from a file.

提交回复
热议问题