Spark - How to write a single csv file WITHOUT folder?

前端 未结 9 1212
北恋
北恋 2020-12-28 13:44

Suppose that df is a dataframe in Spark. The way to write df into a single CSV file is

df.coalesce(1).write.option(\"header\", \"tru

9条回答
  •  死守一世寂寞
    2020-12-28 14:22

    If the result size is comparable to spark driver node's free memory, you may have problems with converting the dataframe to pandas.

    I would tell spark to save to some temporary location, and then copy the individual csv files into desired folder. Something like this:

    import os
    import shutil
    
    TEMPORARY_TARGET="big/storage/name"
    DESIRED_TARGET="/export/report.csv"
    
    df.coalesce(1).write.option("header", "true").csv(TEMPORARY_TARGET)
    
    part_filename = next(entry for entry in os.listdir(TEMPORARY_TARGET) if entry.startswith('part-'))
    temporary_csv = os.path.join(TEMPORARY_TARGET, part_filename)
    
    shutil.copyfile(temporary_csv, DESIRED_TARGET)
    

    If you work with databricks, spark operates with files like dbfs:/mnt/..., and to use python's file operations on them, you need to change the path into /dbfs/mnt/... or (more native to databricks) replace shutil.copyfile with dbutils.fs.cp.

提交回复
热议问题