The following java program was written to experiment with apache spark.
The program tries to read a list of positive and negative words from a respective file, compa
When you create an anonymous class, the compiler does some stuff:
JavaRDD<String> numAs = positiveComments.filter(new Function<String, Boolean>()
{
public Boolean call(String s)
{
return s.contains(iterator.next());
}
});
It will be rewritten as:
JavaRDD<String> numAs = positiveComments.filter(new Function<String, Boolean>()
{
private Iterator<...> $iterator;
public Boolean call(String s)
{
return s.contains($iterator.next());
}
});
This is why you can have a NotSerializableException
because the Iterator is not serializable.
To avoid that, simply extract the result of next before:
String value = iterator.next();
JavaRDD<String> numAs = positiveComments.filter(new Function<String, Boolean>()
{
public Boolean call(String s)
{
return s.contains(value);
}
});
Some Java Facts
Some Facts about Spark.
Rule of thumb to avoid serialization problem:
For a in depth understanding follow http://bytepadding.com/big-data/spark/understanding-spark-serialization/