Write single CSV file using spark-csv

前端 未结 13 2194
心在旅途
心在旅途 2020-11-22 08:43

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.

Need a Scala function which will take

13条回答
  •  闹比i
    闹比i (楼主)
    2020-11-22 09:15

    If you are running Spark with HDFS, I've been solving the problem by writing csv files normally and leveraging HDFS to do the merging. I'm doing that in Spark (1.6) directly:

    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs._
    
    def merge(srcPath: String, dstPath: String): Unit =  {
       val hadoopConfig = new Configuration()
       val hdfs = FileSystem.get(hadoopConfig)
       FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), true, hadoopConfig, null) 
       // the "true" setting deletes the source files once they are merged into the new output
    }
    
    
    val newData = << create your dataframe >>
    
    
    val outputfile = "/user/feeds/project/outputs/subject"  
    var filename = "myinsights"
    var outputFileName = outputfile + "/temp_" + filename 
    var mergedFileName = outputfile + "/merged_" + filename
    var mergeFindGlob  = outputFileName
    
        newData.write
            .format("com.databricks.spark.csv")
            .option("header", "false")
            .mode("overwrite")
            .save(outputFileName)
        merge(mergeFindGlob, mergedFileName )
        newData.unpersist()
    

    Can't remember where I learned this trick, but it might work for you.

提交回复
热议问题