spark 2.1.0 session config settings (pyspark)

前端 未结 5 1118
情深已故
情深已故 2020-12-12 16:27

I am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource.

 spark  = SparkSession.builder
            


        
5条回答
  •  [愿得一人]
    2020-12-12 16:55

    I had a very different requirement where I had to check if I am getting parameters of executor and driver memory size and if getting, had to replace config with only changes in executer and driver. Below are the steps:

    1. Import Libraries
    from pyspark.conf import SparkConf
    from pyspark.sql import SparkSession
    
    1. Define Spark and get the default configuration
    spark = (SparkSession.builder
            .master("yarn")
            .appName("experiment") 
            .config("spark.hadoop.fs.s3a.multiobjectdelete.enable", "false")
            .getOrCreate())
    
    conf = spark.sparkContext._conf.getAll()
    
    1. Check if executor and driver size exists (I am giving here pseudo code 1 conditional check, rest you can create cases) then use the given configuration based on params or skip to the default configuration.
    if executor_mem is not None and driver_mem  is not None:
        conf = spark.sparkContext._conf.setAll([('spark.executor.memory',executor_mem),('spark.driver.memory',driver_mem)])
        spark.sparkContext.stop()
        spark = SparkSession.builder.config(conf=conf).getOrCreate()
    else:
        spark = spark
    

    Don't forget to stop spark context, this will make sure executor and driver memory size have differed as you passed in params. Hope this helps!

提交回复
热议问题