Task not serializable exception while running apache spark job

后端 未结 2 1610
谎友^
谎友^ 2020-12-23 14:48

The following java program was written to experiment with apache spark.

The program tries to read a list of positive and negative words from a respective file, compa

2条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-23 15:41

    When you create an anonymous class, the compiler does some stuff:

    JavaRDD numAs = positiveComments.filter(new Function()
          {
            public Boolean call(String s)
            {
              return s.contains(iterator.next());
            }
          });
    

    It will be rewritten as:

    JavaRDD numAs = positiveComments.filter(new Function()
          {
            private Iterator<...> $iterator;
            public Boolean call(String s)
            {
              return s.contains($iterator.next());
            }
          });
    

    This is why you can have a NotSerializableException because the Iterator is not serializable.

    To avoid that, simply extract the result of next before:

    String value = iterator.next();
    JavaRDD numAs = positiveComments.filter(new Function()
          {
            public Boolean call(String s)
            {
              return s.contains(value);
            }
          });
    

提交回复
热议问题