How to save a DataFrame as compressed (gzipped) CSV?

前端 未结 4 1967
感情败类
感情败类 2020-12-30 23:09

I use Spark 1.6.0 and Scala.

I want to save a DataFrame as compressed CSV format.

Here is what I have so far (assume I already have df and

4条回答
  •  抹茶落季
    2020-12-30 23:49

    With Spark 2.0+, this has become a bit simpler:

    df.write.csv("path", compression="gzip")
    

    You don't need the external Databricks CSV package anymore.

    The csv() writer supports a number of handy options. For example:

    • sep: To set the separator character.
    • quote: Whether and how to quote values.
    • header: Whether to include a header line.

    There are also a number of other compression codecs you can use, in addition to gzip:

    • bzip2
    • lz4
    • snappy
    • deflate

    The full Spark docs for the csv() writer are here: Python / Scala

提交回复
热议问题