I am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource.
spark = SparkSession.builder
I had a very different requirement where I had to check if I am getting parameters of executor and driver memory size and if getting, had to replace config with only changes in executer and driver. Below are the steps:
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
spark = (SparkSession.builder
.master("yarn")
.appName("experiment")
.config("spark.hadoop.fs.s3a.multiobjectdelete.enable", "false")
.getOrCreate())
conf = spark.sparkContext._conf.getAll()
if executor_mem is not None and driver_mem is not None:
conf = spark.sparkContext._conf.setAll([('spark.executor.memory',executor_mem),('spark.driver.memory',driver_mem)])
spark.sparkContext.stop()
spark = SparkSession.builder.config(conf=conf).getOrCreate()
else:
spark = spark
Don't forget to stop spark context, this will make sure executor and driver memory size have differed as you passed in params. Hope this helps!