问题
I would like to use spark jdbc with python. First step was to add a jar:
%AddJar http://central.maven.org/maven2/org/apache/hive/hive-jdbc/2.0.0/hive-jdbc-2.0.0.jar -f
However, the response:
ERROR: Line magic function `%AddJar` not found.
How can I add JDBC jar files in a python script?
回答1:
Presently, this is not possible only from a python notebook; but it is understood as an important requirement. What you can do until this is supported, is from the same spark service instance of your python notebook, create a scala notebook and %AddJar
from there. Then all python notebooks of that same spark service instance can access it. For py notebooks that were active when you added the jar from the scala nb, you will need to restart their kernels.
Note that this works for notebook instances on Jupyter 4+ but not necessarily for earlier IPython notebook instances; check the version from the Help -> About menu from a notebook. Any new notebook instances created recently will be on Jupyter 4+.
回答2:
I don't think this is possible in Notebook's Python Kernel as %Addjar is scala kernel magic function in notebook.
You would need to rely on the service provider to add this jar to python kernel.
Another thing you could try is sc.addjar() but not sure how would it work.
Add jar to pyspark when using notebook
Thanks, Charles.
回答3:
You can try this:
spark.sparkContext.addFile("filename")
来源:https://stackoverflow.com/questions/37661456/how-to-add-a-jar-to-python-notebook-on-bluemix-spark