Add jar to pyspark when using notebook

笑着哭i 提交于 2019-12-23 09:49:05

问题


I'm trying the mongodb hadoop integration with spark but can't figure out how to make the jars accessible to an IPython notebook.

Here what I'm trying to do:

# set up parameters for reading from MongoDB via Hadoop input format
config = {"mongo.input.uri": "mongodb://localhost:27017/db.collection"}
inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"

# these values worked but others might as well
keyClassName = "org.apache.hadoop.io.Text"
valueClassName = "org.apache.hadoop.io.MapWritable"

# Do some reading from mongo
items = sc.newAPIHadoopRDD(inputFormatClassName, keyClassName, valueClassName, None, None, config)

This code works fine when I launch it in pyspark using the following command:

spark-1.4.1/bin/pyspark --jars 'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'

where mongo-hadoop-core-1.4.0.jar and mongo-java-driver-2.10.1.jar allows using mongodb from java. However, when I do this:

IPYTHON_OPTS="notebook" spark-1.4.1/bin/pyspark --jars 'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'

The jars are not available anymore and I get the following error:

java.lang.ClassNotFoundException: com.mongodb.hadoop.MongoInputFormat

Does anyone know how to make jars available to the spark in the IPython notebook? I'm pretty sure this is not specific to mongo so maybe someone already has succeeded in adding jars to the classpath while using the notebook?


回答1:


Very similar, please let me know if this helps: https://issues.apache.org/jira/browse/SPARK-5185



来源:https://stackoverflow.com/questions/31677345/add-jar-to-pyspark-when-using-notebook

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!