Add a header before text file on save in Spark

前端 未结 5 785
感动是毒
感动是毒 2020-12-18 22:35

I have some spark code to process a csv file. It does some transformation on it. I now want to save this RDD as a csv file and add a header. Each line of this RDD is already

5条回答
  •  情歌与酒
    2020-12-18 22:44

    Slightly diff approach with Spark SQL

    From Question: I now want to save this RDD as a CSV file and add a header. Each line of this RDD is already formatted correctly.

    With Spark 2.x you have several options to convert RDD to DataFrame

    val rdd = .... //Assume rdd properly formatted with case class or tuple
    val df = spark.createDataFrame(rdd).toDF("col1", "col2", ... "coln")
    
    df.write
      .format("csv")
      .option("header", "true")  //adds header to file
      .save("hdfs://location/to/save/csv")
    

    Now we can even use Spark SQL DataFrame to load, transform and save CSV file

提交回复
热议问题