Spark: saveAsTextFile without compression

问题

By default, newer versions of Spark use compression when saving text files. For example:

val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("/path/to/output")

will create files in .deflate format. It's quite easy to change compression algorithm, e.g. for .gzip:

import org.apache.hadoop.io.compress._
val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("/path/to/output", classOf[GzipCodec])

But is there a way to save RDD as a plain text files, i.e. without any compression?

回答1:

I can see the text file in HDFS without any compression with this code.

val conf = new SparkConf().setMaster("local").setAppName("App name")
val sc = new SparkContext(conf);
sc.hadoopConfiguration.set("mapred.output.compress", "false")
val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("hdfs/path/to/save/file")

You can set all Hadoop related properties to hadoopConfiguration on sc.

Verified this code in Spark 1.5.2(scala 2.11).

来源：https://stackoverflow.com/questions/40263907/spark-saveastextfile-without-compression

标签

scala

apache-spark

compression

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!