Serializing RDD

人盡茶涼 提交于 2019-11-28 01:29:23

I'm the author of this warning message.

Spark does not support performing actions and transformations on copies of RDDs that are created via deserialization. RDDs are serializable so that certain methods on them can be invoked in executors, but end users shouldn't try to manually perform RDD serialization.

When an RDD is serialized, it loses its reference to the SparkContext that created it, preventing jobs from being launched with it (see here). In earlier versions of Spark, your code would result in a NullPointerException when Spark tried to access the private, null RDD.sc field.

This error message was worded this way because users were frequently running into confusing NullPointerExceptions when trying to do things like rdd1.map { _ => rdd2.count() }, which caused actions to be invoked on deserialized RDDs on executor machines. I didn't anticipate that anyone would try to manually serialize / deserialize their RDDs on the driver, so I can see how this error message could be slightly misleading.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!