SparkContext setLocalProperties

后端未结

关注

 3  1048

As continuation of this question, could you please tell me what properties I can change from SparkContext.setLocalProperties? Could I change cores, RAM etc?

相关标签:

3条回答

执念已碎

2020-12-18 04:49

As per documentation description localProperties is a protected[spark] property of a SparkContext that are the properties through which you can create logical job groups. In other hand they are Inheritable thread-local variables. Which means that they are used in preference to ordinary thread-local variables when the per-thread-attribute being maintained in the variable must be automatically transmitted to any child threads that are created.Propagating local properties to workers starts when SparkContext is requested to run or submit a Spark job that in turn passes them along to DAGScheduler.

And in general Local properties is used to group jobs into pools in FAIR job scheduler by spark.scheduler.pool per-thread property and in method SQLExecution.withNewExecutionIdto set spark.sql.execution.id.

I have no such experience assigning thread-local properties in standalone spark cluster. Worth to try and check it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2020-12-18 04:50

LocalProperties provide an easy mechanism to pass (user defined) configurations from the driver to the executors. You can use the TaskContext on the executor to access them. An example of this is the SQL Execution ID

0 讨论(0)
发布评论:

提交评论
- 加载中...

醉话见心

2020-12-18 05:03

I made some testing with the property spark.executor.memory (the available properties are here), , and actually on a very simple local Spark, starting two threads each with different settings seem to be confined to the threads, with the code (probably not a code you would deploy into production) at the end of this post, making some interleaving of threads to be sure it's not through some sheer scheduling luck, I obtain the following output (cleaning spark output to my console):

Thread 1 Before sleeping mem: 512
Thread 2 Before sleeping mem: 1024
Thread 1 After sleeping mem: 512
Thread 2 After sleeping mem: 1024

Pretty neat to observe a declared property in a thread stays inside the said thread, although I am pretty sure that it can easily lead to nonsensical situation, so I'd still recommend caution before applying such techniques.

public class App {
    private static JavaSparkContext sc;
    public static void main(String[] args) {
        SparkConf conf = new SparkConf().setMaster("local")
                .setAppName("Testing App");
        sc = new JavaSparkContext(conf);
        SparkThread Thread1 = new SparkThread(1);
        SparkThread Thread2 = new SparkThread(2);
        ExecutorService executor = Executors.newFixedThreadPool(2);
        Future ThreadCompletion1 = executor.submit(Thread1);
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        Future ThreadCompletion2 = executor.submit(Thread2);
        try {
            ThreadCompletion1.get();
            ThreadCompletion2.get();
        } catch (InterruptedException | ExecutionException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

    private static class SparkThread implements Runnable{
        private int i = 1;
        public  SparkThread(int i) {
            this.i = i;

        }
        @Override
        public void run() {
            int mem = 512;
            sc.setLocalProperty("spark.executor.memory", Integer.toString(mem * i));
            JavaRDD<String> input = sc.textFile("test" + i);

            FlatMapFunction<String, String> tt = s -> Arrays.asList(s.split(" "))
                    .iterator();
            JavaRDD<String> words = input.flatMap(tt);
            System.out.println("Thread " + i + " Before sleeping mem: " + sc.getLocalProperty("spark.executor.memory"));

            try {
                Thread.sleep(7000);
            } catch (InterruptedException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
            //do some work 
            JavaPairRDD<String, Integer> counts = words.mapToPair(t -> new Tuple2(t, 1))
                    .reduceByKey((x, y) -> (int) x + (int) y);

            counts.saveAsTextFile("output" + i);
            System.out.println("Thread " + i + " After sleeping mem: " + sc.getLocalProperty("spark.executor.memory"));
        }

    }
}

0 讨论(0)