If I start up pyspark and then run this command:
import my_script; spark = my_script.Sparker(sc); spark.collapse(\'./data/\')
Everything is
spark-submit is a utility to submit your spark program (or job) to Spark clusters. If you open the spark-submit utility, it eventually calls a Scala program.
org.apache.spark.deploy.SparkSubmit
On the other hand, pyspark or spark-shell is REPL (read–eval–print loop) utility which allows the developer to run/execute their spark code as they write and can evaluate on fly.
Eventually, both of them run a job behind the scene and the majority of the options are the same if you use the following command
spark-submit --help
pyspark --help
spark-shell --help
spark-submit has some additional option to take your spark program (scala or python) as a bundle (jar/zip for python) or individual .py or .class file.
spark-submit --help
Usage: spark-submit [options] [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
They both also give a WebUI to track the Spark Job progress and other metrics.
When you kill your spark-shell (pyspark or spark-shell) using Ctrl+c, your spark session is killed and WebUI can not show details anymore.
if you look into spark-shell, it has one additional option to run a scrip line by line using -I
Scala REPL options:
-I preload , enforcing line-by-line interpretation