java.lang.ClassCastException using lambda expressions in spark job on remote server

最后都变了- 提交于 2019-11-27 04:37:20
Holger

What you have here, is a follow-up error which masks the original error.

When lambda instances are serialized, they use writeReplace to dissolve their JRE specific implementation from the persistent form which is a SerializedLambda instance. When the SerializedLambda instance has been restored, its readResolve method will be invoked to reconstitute the appropriate lambda instance. As the documentation says, it will do so by invoking a special method of the class which defined the original lambda (see also this answer). The important point is that the original class is needed and that’s what’s missing in your case.

But there’s a …special… behavior of the ObjectInputStream. When it encounters an exception, it doesn’t bail out immediately. It will record the exception and continue the process, marking all object being currently read, thus depending on the erroneous object as being erroneous as well. Only at the end of the process it will throw the original exception it encountered. What makes it so strange is that it will also continue trying to set the fields of these object. But when you look at the method ObjectInputStream.readOrdinaryObject line 1806:

…
    if (obj != null &&
        handles.lookupException(passHandle) == null &&
        desc.hasReadResolveMethod())
    {
        Object rep = desc.invokeReadResolve(obj);
        if (unshared && rep.getClass().isArray()) {
            rep = cloneArray(rep);
        }
        if (rep != obj) {
            handles.setObject(passHandle, obj = rep);
        }
    }

    return obj;
}

you see that it doesn’t call the readResolve method when lookupException reports a non-null exception. But when the substitution did not happen, it’s not a good idea to continue trying to set the field values of the referrer but that’s exactly what’s happens here, hence producing a ClassCastException.

You can easily reproduce the problem:

public class Holder implements Serializable {
    Runnable r;
}
public class Defining {
    public static Holder get() {
        final Holder holder = new Holder();
        holder.r=(Runnable&Serializable)()->{};
        return holder;
    }
}
public class Writing {
    static final File f=new File(System.getProperty("java.io.tmpdir"), "x.ser");
    public static void main(String... arg) throws IOException {
        try(FileOutputStream os=new FileOutputStream(f);
            ObjectOutputStream   oos=new ObjectOutputStream(os)) {
            oos.writeObject(Defining.get());
        }
        System.out.println("written to "+f);
    }
}
public class Reading {
    static final File f=new File(System.getProperty("java.io.tmpdir"), "x.ser");
    public static void main(String... arg) throws IOException, ClassNotFoundException {
        try(FileInputStream is=new FileInputStream(f);
            ObjectInputStream ois=new ObjectInputStream(is)) {
            Holder h=(Holder)ois.readObject();
            System.out.println(h.r);
            h.r.run();
        }
        System.out.println("read from "+f);
    }
}

Compile these four classes and run Writing. Then delete the class file Defining.class and run Reading. Then you will get a

Exception in thread "main" java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field test.Holder.r of type java.lang.Runnable in instance of test.Holder
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2089)
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)

(Tested with 1.8.0_20)


The bottom line is that you may forget about this Serialization issue once it is understood what’s happening, all you have to do for solving your problem is to make sure that the class which defined the lambda expression is also available in the runtime where the lambda is deserialized.

Example for Spark Job to run directly from IDE (spark-submit distributes jar by default):

SparkConf sconf = new SparkConf()
  .set("spark.eventLog.dir", "hdfs://nn:8020/user/spark/applicationHistory")
  .set("spark.eventLog.enabled", "true")
  .setJars(new String[]{"/path/to/jar/with/your/class.jar"})
  .setMaster("spark://spark.standalone.uri:7077");

I suppose your problem is failed auto-boxing. In the code

x -> {
      return true;
}

you pass (String->boolean) lambda (it is Predicate<String>) while filter method takes (String->Boolean) lambda (it is Function<String,Boolean>). So I offer you to change code to

x -> {
      return Boolean.TRUE;
}

Include details into your question please. Output from uname -a and java -version is appreciated. Provide sscce if possible.

I had the same error and I replaced the lambda with an inner class, then it worked. I don't really understand why, and reproducing this error was extremely difficult (we had one server which exhibited the behavior, and nowhere else).

Causes serialization problems (uses lambdas, causes SerializedLambda error)

this.variable = () -> { ..... }

Yields java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field MyObject.val$variable

Works

this.variable = new MyInterface() {
    public void myMethod() {
       .....
    }
};

You can maybe more simply remplace your Java8 lambda with a spark.scala.Function

replace

output = rdds.map(x->this.function(x)).collect()

with:

output = rdds.map(new Function<Double,Double>(){

   public Double call(Double x){
       return MyClass.this.function(x);
   }

}).collect();
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!