How to know deploy mode of PySpark application?

后端 未结 3 460
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-18 01:28

I am trying to fix an issue with running out of memory, and I want to know whether I need to change these settings in the default configurations file (spark-defaults.c

相关标签:
3条回答
  • 2020-12-18 02:16

    If you are running an interactive shell, e.g. pyspark (CLI or via an IPython notebook), by default you are running in client mode. You can easily verify that you cannot run pyspark or any other interactive shell in cluster mode:

    $ pyspark --master yarn --deploy-mode cluster
    Python 2.7.11 (default, Mar 22 2016, 01:42:54)
    [GCC Intel(R) C++ gcc 4.8 mode] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    Error: Cluster deploy mode is not applicable to Spark shells.
    
    $ spark-shell --master yarn --deploy-mode cluster
    Error: Cluster deploy mode is not applicable to Spark shells.
    

    Examining the contents of the bin/pyspark file may be instructive, too - here is the final line (which is the actual executable):

    $ pwd
    /home/ctsats/spark-1.6.1-bin-hadoop2.6
    $ cat bin/pyspark
    [...]
    exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@"
    

    i.e. pyspark is actually a script run by spark-submit and given the name PySparkShell, by which you can find it in the Spark History Server UI; and since it is run like that, it goes by whatever arguments (or defaults) are included with its spark-submit command.

    0 讨论(0)
  • 2020-12-18 02:18

    Since sc.deployMode is not available in PySpark, you could check spark.submit.deployMode

    scala> sc.getConf.get("spark.submit.deployMode")
    res0: String = client
    

    This is not available in PySpark

    Use sc.deployMode

    scala> sc.deployMode
    res0: String = client
    
    scala> sc.version
    res1: String = 2.1.0-SNAPSHOT
    
    0 讨论(0)
  • As of Spark 2+ the below works.

    for item in spark.sparkContext.getConf().getAll():print(item)
    
    (u'spark.submit.deployMode', u'client') # will be one of the items in the list.
    
    0 讨论(0)
提交回复
热议问题