What is the difference between spark-submit and pyspark?

后端 未结 3 1478
走了就别回头了
走了就别回头了 2020-12-01 14:36

If I start up pyspark and then run this command:

import my_script; spark = my_script.Sparker(sc); spark.collapse(\'./data/\')

Everything is

3条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-01 15:06

    spark-submit is a utility to submit your spark program (or job) to Spark clusters. If you open the spark-submit utility, it eventually calls a Scala program.

    org.apache.spark.deploy.SparkSubmit 
    

    On the other hand, pyspark or spark-shell is REPL (read–eval–print loop) utility which allows the developer to run/execute their spark code as they write and can evaluate on fly.

    Eventually, both of them run a job behind the scene and the majority of the options are the same if you use the following command

    spark-submit --help
    pyspark --help
    spark-shell --help
    

    spark-submit has some additional option to take your spark program (scala or python) as a bundle (jar/zip for python) or individual .py or .class file.

    spark-submit --help
    Usage: spark-submit [options]  [app arguments]
    Usage: spark-submit --kill [submission ID] --master [spark://...]
    Usage: spark-submit --status [submission ID] --master [spark://...]
    

    They both also give a WebUI to track the Spark Job progress and other metrics.

    When you kill your spark-shell (pyspark or spark-shell) using Ctrl+c, your spark session is killed and WebUI can not show details anymore.

    if you look into spark-shell, it has one additional option to run a scrip line by line using -I

    Scala REPL options:
      -I                    preload , enforcing line-by-line interpretation
    

提交回复
热议问题