I have an Spark app which runs with no problem in local mode,but have some problems when submitting to the Spark cluster.
The error msg are as follows:
If you are using following code
val sc = new SparkContext(master, "WordCount", System.getenv("SPARK_HOME"))
Then replace with following lines
val jobName = "WordCount";
val conf = new SparkConf().setAppName(jobName);
val sc = new SparkContext(conf)
In Spark 2.0 you can use following code
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.master("local[*]")// need to add
.getOrCreate()
You need to add .master("local[*]") if runing local here * means all node , you can say insted of 8 1,2 etc
You need to set Master URL if on cluster
How does spark context in your application pick the value for spark master?
SparkConf
while creating SC.System.getProperties
(where SparkSubmit earlier put it after reading your --master
argument).Now, SparkSubmit
runs on the driver -- which in your case is the machine from where you're executing the spark-submit
script. And this is probably working as expected for you too.
However, from the information you've posted it looks like you are creating a spark context in the code that is sent to the executor -- and given that there is no spark.master
system property available there, it fails. (And you shouldn't really be doing so, if this is the case.)
Can you please post the GroupEvolutionES
code (specifically where you're creating SparkContext(s)
).
try this
make trait
import org.apache.spark.sql.SparkSession
trait SparkSessionWrapper {
lazy val spark:SparkSession = {
SparkSession
.builder()
.getOrCreate()
}
}
extends it
object Preprocess extends SparkSessionWrapper {
I used this SparkContext constructor instead, and errors were gone:
val sc = new SparkContext("local[*]", "MyApp")
var appName:String ="test"
val conf = new SparkConf().setAppName(appName).setMaster("local[*]").set("spark.executor.memory","1g");
val sc = SparkContext.getOrCreate(conf)
sc.setLogLevel("WARN")
The default value of "spark.master" is spark://HOST:PORT, and the following code tries to get a session from the standalone cluster that is running at HOST:PORT, and expects the HOST:PORT value to be in the spark config file.
SparkSession spark = SparkSession
.builder()
.appName("SomeAppName")
.getOrCreate();
"org.apache.spark.SparkException: A master URL must be set in your configuration" states that HOST:PORT is not set in the spark configuration file.
To not bother about value of "HOST:PORT", set spark.master as local
SparkSession spark = SparkSession
.builder()
.appName("SomeAppName")
.config("spark.master", "local")
.getOrCreate();
Here is the link for list of formats in which master URL can be passed to spark.master
Reference : Spark Tutorial - Setup Spark Ecosystem