How to save a DataFrame as compressed (gzipped) CSV?

前端 未结 4 1941
感情败类
感情败类 2020-12-30 23:09

I use Spark 1.6.0 and Scala.

I want to save a DataFrame as compressed CSV format.

Here is what I have so far (assume I already have df and

4条回答
  •  無奈伤痛
    2020-12-30 23:52

    This code works for Spark 2.1, where .codec is not available.

    df.write
      .format("com.databricks.spark.csv")
      .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
      .save(my_directory)
    

    For Spark 2.2, you can use the df.write.csv(...,codec="gzip") option described here: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=codec

提交回复
热议问题