发表新帖

发表新帖

Spark on yarn concept understanding

后端未结

关注

 3  503

死守一世寂寞 2020-11-28 21:51

I am trying to understand how spark runs on YARN cluster/client. I have the following question in my mind.

Is it necessary that spark is installed on all th

3条回答

遥遥无期 (楼主)

2020-11-28 22:09
We are running spark jobs on YARN (we use HDP 2.2).

We don't have spark installed on the cluster. We only added the Spark assembly jar to the HDFS.

For example to run the Pi example:
```
./bin/spark-submit \
  --verbose \
  --class org.apache.spark.examples.SparkPi \
  --master yarn-cluster \
  --conf spark.yarn.jar=hdfs://master:8020/spark/spark-assembly-1.3.1-hadoop2.6.0.jar \
  --num-executors 2 \
  --driver-memory 512m \
  --executor-memory 512m \
  --executor-cores 4 \
  hdfs://master:8020/spark/spark-examples-1.3.1-hadoop2.6.0.jar 100
```
--conf spark.yarn.jar=hdfs://master:8020/spark/spark-assembly-1.3.1-hadoop2.6.0.jar - This config tell the yarn from were to take the spark assembly. If you don't use it, it will upload the jar from were you run spark-submit.

About your second question: The client node doesn't not need Hadoop installed. It only needs the configuration files. You can copy the directory from your cluster to your client.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题