How to set and get static variables from spark?

前端 未结 4 1325
孤独总比滥情好
孤独总比滥情好 2020-12-06 07:09

I have a class as this:

public class Test {
    private static String name;

    public static String getName() {
        return name;
    }

    public stati         


        
4条回答
  •  囚心锁ツ
    2020-12-06 07:35

    Ok, there is basically 2 ways to take a value known to the master to the executors:

    1. Put the value inside a closure to be serialized to the executors to perform a task. This is the most common one and very simple/elegant. Sample and doc here.
    2. Create a broadcast variable with the data. This is good for immutable data of a big size, so you want to guarantee it is send only once. Also good if the same data is used over and over. Sample and doc here.

    No need to use static variables in either case. But, if you DO want to have static values available on your executor VMs, you need to do one of these:

    1. If the values are fixed or the configuration is available on the executor nodes (lives inside the jar, etc), then you can have a lazy val, guaranteeing initialization only once.
    2. You can call mapPartitions() with code that uses one of the 2 options above, then store the values on your static variable/object. mapPartitions is guaranteed to run only once for each partition (much better than once per line) and is good for this kind of thing (initializing DB connections, etc).

    Hope this helps!

    P.S: As for you exception: I just don't see it on that code sample, my bet is that it is occurring elsewhere.


    Edit for extra clarification: The lazy val solution is simply Scala, no Spark involved...

    object MyStaticObject
    {
      lazy val MyStaticValue = {
         // Call a database, read a file included in the Jar, do expensive initialization computation, etc
         4
      }
    } 
    

    Since each Executor corresponds to a JVM, once the classes are loaded MyStaticObject will be initialized. The lazy keyword guarantees that the MyStaticValue variable will only be initialized the first time it is actually requested, and hold its value ever since.

提交回复
热议问题