Spark-Can not access first element in an JavaRDD using first()

走远了吗. 提交于 2019-12-24 03:07:25

问题


Using spark and its Java API. I have loaded data to an JavaRDD<CustomizedDataStructure> like this:

JavaRDD<CustomizedDataStructure> myRDD;

And when I do:

myRDD.count();

it returns value to me, shows that it do contains data, not a null RDD. But then when running:

myRDD.first();

It should return me a <CustomizedDataStructure>, but it gives such error:

14:30:39,782 ERROR [TaskSetManager] Task 0.0 in stage 0.0 (TID 0) had a not serializable result:

Why it is not serializable ?


回答1:


When you call first(), you cause the first element of the RDD to be copied to the driver process. For that to happen, it has to be serializable, and by default that means implement java.io.Serializable. My guess is that this custom class does not.



来源:https://stackoverflow.com/questions/27450944/spark-can-not-access-first-element-in-an-javardd-using-first

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!