How to export a table dataframe in PySpark to csv?

后端 未结 5 870
半阙折子戏
半阙折子戏 2020-11-27 02:33

I am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object that is a DataFrame. I want to export this DataFrame

5条回答
  •  庸人自扰
    2020-11-27 03:13

    You need to repartition the Dataframe in a single partition and then define the format, path and other parameter to the file in Unix file system format and here you go,

    df.repartition(1).write.format('com.databricks.spark.csv').save("/path/to/file/myfile.csv",header = 'true')
    

    Read more about the repartition function Read more about the save function

    However, repartition is a costly function and toPandas() is worst. Try using .coalesce(1) instead of .repartition(1) in previous syntax for better performance.

    Read more on repartition vs coalesce functions.

提交回复
热议问题