发表新帖

发表新帖

What is the difference between spark-submit and pyspark?

后端未结

关注

 3  1478

走了就别回头了 2020-12-01 14:36

If I start up pyspark and then run this command:

import my_script; spark = my_script.Sparker(sc); spark.collapse(\'./data/\')

Everything is

3条回答

予麋鹿 (楼主)

2020-12-01 15:06
spark-submit is a utility to submit your spark program (or job) to Spark clusters. If you open the spark-submit utility, it eventually calls a Scala program.
```
org.apache.spark.deploy.SparkSubmit 
```
On the other hand, pyspark or spark-shell is REPL (read–eval–print loop) utility which allows the developer to run/execute their spark code as they write and can evaluate on fly.

Eventually, both of them run a job behind the scene and the majority of the options are the same if you use the following command
```
spark-submit --help
pyspark --help
spark-shell --help
```
spark-submit has some additional option to take your spark program (scala or python) as a bundle (jar/zip for python) or individual .py or .class file.
```
spark-submit --help
Usage: spark-submit [options]  [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
```
They both also give a WebUI to track the Spark Job progress and other metrics.

When you kill your spark-shell (pyspark or spark-shell) using Ctrl+c, your spark session is killed and WebUI can not show details anymore.

if you look into spark-shell, it has one additional option to run a scrip line by line using -I
```
Scala REPL options:
  -I                    preload , enforcing line-by-line interpretation
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题