I\'m launching a pyspark program:
$ export SPARK_HOME=
$ export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip
$ python
Any dependencies can be passed using spark.jars.packages (setting spark.jars should work as well) property in the $SPARK_HOME/conf/spark-defaults.conf. It should be a comma separated list of coordinates.
And packages or classpath properties have to be set before JVM is started and this happens during SparkConf initialization. It means that SparkConf.set method cannot be used here.
Alternative approach is to set PYSPARK_SUBMIT_ARGS environment variable before SparkConf object is initialized:
import os
from pyspark import SparkConf
SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.2.0 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
conf = SparkConf()
sc = SparkContext(conf=conf)