I use Spark 1.6.0 and Scala.
I want to save a DataFrame as compressed CSV format.
Here is what I have so far (assume I already have df
and
This code works for Spark 2.1, where .codec
is not available.
df.write
.format("com.databricks.spark.csv")
.option("codec", "org.apache.hadoop.io.compress.GzipCodec")
.save(my_directory)
For Spark 2.2, you can use the df.write.csv(...,codec="gzip")
option described here: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=codec