PySpark: spit out single file when writing instead of multiple part files

前端 未结 3 1194
北荒
北荒 2021-01-02 12:50

Is there a way to prevent PySpark from creating several small files when writing a DataFrame to JSON file?

If I run:

 df.write.format(\'json\').save(         


        
3条回答
  •  庸人自扰
    2021-01-02 13:30

    This was a better solution for me.

    rdd.map(json.dumps) .saveAsTextFile(json_lines_file_name)

提交回复
热议问题