I use Spark 1.6.0 and Scala.
I want to save a DataFrame as compressed CSV format.
Here is what I have so far (assume I already have df and
With Spark 2.0+, this has become a bit simpler:
df.write.csv("path", compression="gzip")
You don't need the external Databricks CSV package anymore.
The csv() writer supports a number of handy options. For example:
sep: To set the separator character.quote: Whether and how to quote values.header: Whether to include a header line.There are also a number of other compression codecs you can use, in addition to gzip:
bzip2lz4snappydeflateThe full Spark docs for the csv() writer are here: Python / Scala