SparkContext setLocalProperties

后端 未结 3 1022
野趣味
野趣味 2020-12-18 04:35

As continuation of this question, could you please tell me what properties I can change from SparkContext.setLocalProperties? Could I change cores, RAM etc?

相关标签:
3条回答
  • 2020-12-18 04:49

    As per documentation description localProperties is a protected[spark] property of a SparkContext that are the properties through which you can create logical job groups. In other hand they are Inheritable thread-local variables. Which means that they are used in preference to ordinary thread-local variables when the per-thread-attribute being maintained in the variable must be automatically transmitted to any child threads that are created.Propagating local properties to workers starts when SparkContext is requested to run or submit a Spark job that in turn passes them along to DAGScheduler.

    And in general Local properties is used to group jobs into pools in FAIR job scheduler by spark.scheduler.pool per-thread property and in method SQLExecution.withNewExecutionIdto set spark.sql.execution.id.

    I have no such experience assigning thread-local properties in standalone spark cluster. Worth to try and check it.

    0 讨论(0)
  • 2020-12-18 04:50

    LocalProperties provide an easy mechanism to pass (user defined) configurations from the driver to the executors. You can use the TaskContext on the executor to access them. An example of this is the SQL Execution ID

    0 讨论(0)
  • 2020-12-18 05:03

    I made some testing with the property spark.executor.memory (the available properties are here), , and actually on a very simple local Spark, starting two threads each with different settings seem to be confined to the threads, with the code (probably not a code you would deploy into production) at the end of this post, making some interleaving of threads to be sure it's not through some sheer scheduling luck, I obtain the following output (cleaning spark output to my console):

    Thread 1 Before sleeping mem: 512
    Thread 2 Before sleeping mem: 1024
    Thread 1 After sleeping mem: 512
    Thread 2 After sleeping mem: 1024
    

    Pretty neat to observe a declared property in a thread stays inside the said thread, although I am pretty sure that it can easily lead to nonsensical situation, so I'd still recommend caution before applying such techniques.

    public class App {
        private static JavaSparkContext sc;
        public static void main(String[] args) {
            SparkConf conf = new SparkConf().setMaster("local")
                    .setAppName("Testing App");
            sc = new JavaSparkContext(conf);
            SparkThread Thread1 = new SparkThread(1);
            SparkThread Thread2 = new SparkThread(2);
            ExecutorService executor = Executors.newFixedThreadPool(2);
            Future ThreadCompletion1 = executor.submit(Thread1);
            try {
                Thread.sleep(5000);
            } catch (InterruptedException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            }
            Future ThreadCompletion2 = executor.submit(Thread2);
            try {
                ThreadCompletion1.get();
                ThreadCompletion2.get();
            } catch (InterruptedException | ExecutionException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
    
        }
    
        private static class SparkThread implements Runnable{
            private int i = 1;
            public  SparkThread(int i) {
                this.i = i;
    
            }
            @Override
            public void run() {
                int mem = 512;
                sc.setLocalProperty("spark.executor.memory", Integer.toString(mem * i));
                JavaRDD<String> input = sc.textFile("test" + i);
    
                FlatMapFunction<String, String> tt = s -> Arrays.asList(s.split(" "))
                        .iterator();
                JavaRDD<String> words = input.flatMap(tt);
                System.out.println("Thread " + i + " Before sleeping mem: " + sc.getLocalProperty("spark.executor.memory"));
    
                try {
                    Thread.sleep(7000);
                } catch (InterruptedException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
                //do some work 
                JavaPairRDD<String, Integer> counts = words.mapToPair(t -> new Tuple2(t, 1))
                        .reduceByKey((x, y) -> (int) x + (int) y);
    
                counts.saveAsTextFile("output" + i);
                System.out.println("Thread " + i + " After sleeping mem: " + sc.getLocalProperty("spark.executor.memory"));
            }
    
        }
    }
    
    0 讨论(0)
提交回复
热议问题