Automatically including jars to PySpark classpath

邮差的信 提交于 2019-11-27 21:02:08

问题


I'm trying to automatically include jars to my PySpark classpath. Right now I can type the following command and it works:

$ pyspark --jars /path/to/my.jar

I'd like to have that jar included by default so that I can only type pyspark and also use it in IPython Notebook.

I've read that I can include the argument by setting PYSPARK_SUBMIT_ARGS in env:

export PYSPARK_SUBMIT_ARGS="--jars /path/to/my.jar"

Unfortunately the above doesn't work. I get the runtime error Failed to load class for data source.

Running Spark 1.3.1.

Edit

My workaround when using IPython Notebook is the following:

$ IPYTHON_OPTS="notebook" pyspark --jars /path/to/my.jar

回答1:


You can add the jar files in the spark-defaults.conf file (located in the conf folder of your spark installation). If there is more than one entry in the jars list, use : as separator.

spark.driver.extraClassPath /path/to/my.jar

This property is documented in https://spark.apache.org/docs/1.3.1/configuration.html#runtime-environment




回答2:


As far as I know, you have to import jars to both driver AND executor. So, you need to edit conf/spark-defaults.conf adding both lines below.

spark.driver.extraClassPath /path/to/my.jar
spark.executor.extraClassPath /path/to/my.jar

When I went through this, I did not need any other parameters. I guess you will not need them too.




回答3:


Recommended way since Spark 2.0+ is to use spark.driver.extraLibraryPath and spark.executor.extraLibraryPath

https://spark.apache.org/docs/2.4.3/configuration.html#runtime-environment

ps. spark.driver.extraClassPath and spark.executor.extraClassPath are still there, but deprecated and will be removed in a future release of Spark.



来源:https://stackoverflow.com/questions/31464845/automatically-including-jars-to-pyspark-classpath

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!