RDD size remains the same even after compressing

喜你入骨 提交于 2019-12-23 17:41:05

问题


I use SparkListener to monitor the cached RDDs' sizes. However, I notice that no matter what I do, the RDDs' size always remain the same. I did the following things to compress the RDDs.

val conf = new SparkConf().setAppName("MyApp")
conf.set("spark.rdd.compress","true")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.....
val sc = new SparkContext(conf)
....
myrdd.persist(MEMORY_ONLY_SER)

Even, if I remove the second and third lines shown above, Spark listener shows the same size of the RDD, which means that setting spark.rdd.compress to true and enabling kryo serialization had no effect (OK kryo is only for serialization, but spark.rdd.compress at least could have done the trick). What mistake could I be doing?

Note that my RDD is of type (Long, String). Could that be the reason? I mean, could it be that Spark doesn't compress RDDs of this type, especially when strings are short in size?

P.S: I am using Spark 1.6

来源:https://stackoverflow.com/questions/40112007/rdd-size-remains-the-same-even-after-compressing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!