问题
Using spark
and its Java
API. I have loaded data to an JavaRDD<CustomizedDataStructure>
like this:
JavaRDD<CustomizedDataStructure> myRDD;
And when I do:
myRDD.count();
it returns value to me, shows that it do contains data, not a null
RDD.
But then when running:
myRDD.first();
It should return me a <CustomizedDataStructure>
, but it gives such error:
14:30:39,782 ERROR [TaskSetManager] Task 0.0 in stage 0.0 (TID 0) had a not serializable result:
Why it is not serializable
?
回答1:
When you call first()
, you cause the first element of the RDD to be copied to the driver process. For that to happen, it has to be serializable, and by default that means implement java.io.Serializable
. My guess is that this custom class does not.
来源:https://stackoverflow.com/questions/27450944/spark-can-not-access-first-element-in-an-javardd-using-first