How to export a table dataframe in PySpark to csv?

后端 未结 5 890
半阙折子戏
半阙折子戏 2020-11-27 02:33

I am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object that is a DataFrame. I want to export this DataFrame

5条回答
  •  情书的邮戳
    2020-11-27 03:27

    If you cannot use spark-csv, you can do the following:

    df.rdd.map(lambda x: ",".join(map(str, x))).coalesce(1).saveAsTextFile("file.csv")
    

    If you need to handle strings with linebreaks or comma that will not work. Use this:

    import csv
    import cStringIO
    
    def row2csv(row):
        buffer = cStringIO.StringIO()
        writer = csv.writer(buffer)
        writer.writerow([str(s).encode("utf-8") for s in row])
        buffer.seek(0)
        return buffer.read().strip()
    
    df.rdd.map(row2csv).coalesce(1).saveAsTextFile("file.csv")
    

提交回复
热议问题