Spark - How to write a single csv file WITHOUT folder?

前端 未结 9 1226
北恋
北恋 2020-12-28 13:44

Suppose that df is a dataframe in Spark. The way to write df into a single CSV file is

df.coalesce(1).write.option(\"header\", \"tru

9条回答
  •  南笙
    南笙 (楼主)
    2020-12-28 14:11

    Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same in Databricks.

    fpath=output+'/'+'temp'
    
    def file_exists(path):
      try:
        dbutils.fs.ls(path)
        return True
      except Exception as e:
        if 'java.io.FileNotFoundException' in str(e):
          return False
        else:
          raise
    
    if file_exists(fpath):
      dbutils.fs.rm(fpath)
      df.coalesce(1).write.option("header", "true").csv(fpath)
    else:
      df.coalesce(1).write.option("header", "true").csv(fpath)
    
    fname=([x.name for x in dbutils.fs.ls(fpath) if x.name.startswith('part-00000')])
    dbutils.fs.cp(fpath+"/"+fname[0], output+"/"+"name.csv")
    dbutils.fs.rm(fpath, True) 
    
    

提交回复
热议问题