I\'m using Spark 2.0 with PySpark.
I am redefining SparkSession
parameters through a GetOrCreate
method that was introduced in 2.0:
I believe that documentation is a bit misleading here and when you work with Scala you actually see a warning like this:
... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.
It was more obvious prior to Spark 2.0 with clear separation between contexts:
SparkContext
configuration cannot be modified on runtime. You have to stop existing context first.SQLContext
configuration can be modified on runtime. spark.app.name
, like many other options, is bound to SparkContext
, and cannot be modified without stopping the context.
Reusing existing SparkContext
/ SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
spark.conf.get("spark.sql.shuffle.partitions")
String = 200
val conf = new SparkConf()
.setAppName("foo")
.set("spark.sql.shuffle.partitions", "2001")
val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkSession$Builder: Use an existing SparkSession ...
spark: org.apache.spark.sql.SparkSession = ...
spark.conf.get("spark.sql.shuffle.partitions")
String = 2001
While spark.app.name
config is updated:
spark.conf.get("spark.app.name")
String = foo
it doesn't affect SparkContext
:
spark.sparkContext.appName
String = Spark shell
Stopping existing SparkContext
/ SparkSession
Now let's stop the session and repeat the process:
spark.stop
val spark = SparkSession.builder.config(conf).getOrCreate()
... WARN SparkContext: Use an existing SparkContext ...
spark: org.apache.spark.sql.SparkSession = ...
spark.sparkContext.appName
String = foo
Interestingly when we stop the session we still get a warning about using existing SparkContext
, but you can check it is actually stopped.
I ran into the same problem and struggled with it for a long time, then find a simple solution:
spark.stop()
then build your new sparksession again