Add a header before text file on save in Spark

前端未结

关注

 5  789

感动是毒 2020-12-18 22:35

I have some spark code to process a csv file. It does some transformation on it. I now want to save this RDD as a csv file and add a header. Each line of this RDD is already

5条回答

情深已故 (楼主)

2020-12-18 22:41
You can make an RDD out of your header line and then union it, yes:
```
val rdd: RDD[String] = ...
val header: RDD[String] = sc.parallelize(Array("my,header,row"))
header.union(rdd).saveAsTextFile(...)
```
Then you end up with a bunch of part-xxxxx files that you merge.

The problem is that I don't think you're guaranteed that the header will be the first partition and therefore end up in part-00000 and at the top of your file. In practice, I'm pretty sure it will.

More reliable would be to use Hadoop commands like hdfs to merge the part-xxxxx files, and as part of the command, just throw in the header line from a file.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...