flink - using dagger injections - not serializable?

情到浓时终转凉″ 提交于 2019-12-03 06:21:34

Before diving into the specifics of the question, a bit of background on serializability of functions in Apache Flink:

Serializability

Apache Flink uses Java Serialization (java.io.Serializable) to ship the function objects (here the MapFunction) to the workers that execute them in parallel. Because of that, the functions need to be serializable: The function may not contain any non-serializable fields, i.e. types that are not primitive (int, long, double, ...) and not implementing java.io.Serializable.

The typical way to work with non-serializable constructs is to lazily initialize them.

Lazy Initialization

One way to use non-serializable types in Flink functions is to lazily initialize them. The fields that hold these types are still null when the function is serialized to be shipped, and only set after the function has been deserialized by the workers.

  • In Scala, you can simply use lazy fields, for example lazy val x = new NonSerializableType(). The NonSerializableType type is actually only created upon first access to the variable x, which is usually on the worker. Consequently, the type can be non serializable, because x is null when the function is serialized to shipping to the workers.

  • In Java, you can initialize the non-serializable fields on the open() method of the function, if you make it a Rich Function. Rich functions (like RichMapFunction) are extended versions of basic functions (here MapFunction) and give you access to life-cycle methods like open() and close().

Lazy Dependency Injections

I am not too familiar with dependency injection, but dagger seems to provide something like a lazy dependency as well, which may help as a workaround quite like lazy variables in Scala:

new MapFunction<Long, Long>() {

  @Inject Lazy<MyDependency> dep;

  public Long map(Long value) {
    return dep.get().doSomething(value);
  }
}

I faced a similar issue. There are 2 ways to not deserialize your dependency.

  1. Make your dependency static, but it is not always possible. It can also mess your code design.

  2. Use Transient: By declaring your dependency as transient you are saying that they are not part of the persistent state of an object, and should not be part of serialization.

public ClassA implements Serializable{
  //class A code here
}

public ClassB{
  //class B code here
}

public class MySinkFunction implements SinkFunction<MyData> {
  private ClassA mySerializableDependency;
  private transient ClassB nonSerializableDependency;
}

This is especially useful when you are using external libraries, whose implementations cannot be changed by you to make them serializable.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!