In Spark, what is the right way to have a static object on all workers?

前端 未结 1 469
走了就别回头了
走了就别回头了 2020-12-13 07:38

I\'ve been looking at the documentation for Spark and it mentions this:

Spark’s API relies heavily on passing functions in the driver program to run

相关标签:
1条回答
  • 2020-12-13 08:39

    This is less a question about Spark and more of a question of how Scala generates code. Remember that a Scala object is pretty much a Java class full of static methods. Consider a simple example like this:

    object foo {
    
      val value = 42
    
      def func(i: Int): Int = i + value
    
      def main(args: Array[String]): Unit = {
        println(Seq(1, 2, 3).map(func).sum)
      }
    
    }
    

    That will be translated to 3 Java classes; one of them will be the closure that is a parameter to the map method. Using javap on that class yields something like this:

    public final class foo$$anonfun$main$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
      public static final long serialVersionUID;
      public final int apply(int);
      public int apply$mcII$sp(int);
      public final java.lang.Object apply(java.lang.Object);
      public foo$$anonfun$main$1();
    }
    

    Note there are no fields or anything. If you look at the disassembled bytecode, all it does is call the func() method. When running in Spark, this is the instance that will get serialized; since it has no fields, there's not much to be serialized.

    As for your question, how to initialize static objects, you can have an idempotent initialization function that you call at the start of your closures. The first one will trigger initialization, the subsequent calls will be no-ops. Cleanup, though, is a lot trickier, since I'm not familiar with an API that does something like "run this code on all executors".

    One approach that can be useful if you need cleanup is explained in this blog, in the "setup() and cleanup()" section.

    EDIT: just for clarification, here's the disassembly of the method that actually makes the call.

    public int apply$mcII$sp(int);
      Code:
       0:   getstatic       #29; //Field foo$.MODULE$:Lfoo$;
       3:   iload_1
       4:   invokevirtual   #32; //Method foo$.func:(I)I
       7:   ireturn
    

    See how it just references the static field holding the singleton and calls the func() method.

    0 讨论(0)
提交回复
热议问题