Add Jar to standalone pyspark

前端 未结 5 1310
[愿得一人]
[愿得一人] 2020-11-27 16:49

I\'m launching a pyspark program:

$ export SPARK_HOME=
$ export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip
$ python
5条回答
  •  悲&欢浪女
    2020-11-27 17:12

    Any dependencies can be passed using spark.jars.packages (setting spark.jars should work as well) property in the $SPARK_HOME/conf/spark-defaults.conf. It should be a comma separated list of coordinates.

    And packages or classpath properties have to be set before JVM is started and this happens during SparkConf initialization. It means that SparkConf.set method cannot be used here.

    Alternative approach is to set PYSPARK_SUBMIT_ARGS environment variable before SparkConf object is initialized:

    import os
    from pyspark import SparkConf
    
    SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.2.0 pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
    
    conf = SparkConf()
    sc = SparkContext(conf=conf)
    

提交回复
热议问题