问题
I'm trying to automatically include jars to my PySpark classpath. Right now I can type the following command and it works:
$ pyspark --jars /path/to/my.jar
I'd like to have that jar included by default so that I can only type pyspark and also use it in IPython Notebook.
I've read that I can include the argument by setting PYSPARK_SUBMIT_ARGS in env:
export PYSPARK_SUBMIT_ARGS="--jars /path/to/my.jar"
Unfortunately the above doesn't work. I get the runtime error Failed to load class for data source.
Running Spark 1.3.1.
Edit
My workaround when using IPython Notebook is the following:
$ IPYTHON_OPTS="notebook" pyspark --jars /path/to/my.jar
回答1:
You can add the jar files in the spark-defaults.conf file (located in the conf folder of your spark installation). If there is more than one entry in the jars list, use : as separator.
spark.driver.extraClassPath /path/to/my.jar
This property is documented in https://spark.apache.org/docs/1.3.1/configuration.html#runtime-environment
回答2:
As far as I know, you have to import jars to both driver AND executor. So, you need to edit conf/spark-defaults.conf adding both lines below.
spark.driver.extraClassPath /path/to/my.jar
spark.executor.extraClassPath /path/to/my.jar
When I went through this, I did not need any other parameters. I guess you will not need them too.
回答3:
Recommended way since Spark 2.0+ is to use
spark.driver.extraLibraryPath
and spark.executor.extraLibraryPath
https://spark.apache.org/docs/2.4.3/configuration.html#runtime-environment
ps. spark.driver.extraClassPath and spark.executor.extraClassPath are still there,
but deprecated and will be removed in a future release of Spark.
来源:https://stackoverflow.com/questions/31464845/automatically-including-jars-to-pyspark-classpath