Add Jar to standalone pyspark

前端 未结 5 1294
[愿得一人]
[愿得一人] 2020-11-27 16:49

I\'m launching a pyspark program:

$ export SPARK_HOME=
$ export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip
$ python
5条回答
  •  渐次进展
    2020-11-27 17:11

    I encountered a similar issue for a different jar ("MongoDB Connector for Spark", mongo-spark-connector), but the big caveat was that I installed Spark via pyspark in conda (conda install pyspark). Therefore, all the assistance for Spark-specific answers weren't exactly helpful. For those of you installing with conda, here is the process that I cobbled together:

    1) Find where your pyspark/jars are located. Mine were in this path: ~/anaconda2/pkgs/pyspark-2.3.0-py27_0/lib/python2.7/site-packages/pyspark/jars.

    2) Download the jar file into the path found in step 1, from this location.

    3) Now you should be able to run something like this (code taken from MongoDB official tutorial, using Briford Wylie's answer above):

    from pyspark.sql import SparkSession
    
    my_spark = SparkSession \
        .builder \
        .appName("myApp") \
        .config("spark.mongodb.input.uri", "mongodb://127.0.0.1:27017/spark.test_pyspark_mbd_conn") \
        .config("spark.mongodb.output.uri", "mongodb://127.0.0.1:27017/spark.test_pyspark_mbd_conn") \
        .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11:2.2.2') \
        .getOrCreate()
    

    Disclaimers:

    1) I don't know if this answer is the right place/SO question to put this; please advise of a better place and I will move it.

    2) If you think I have errored or have improvements to the process above, please comment and I will revise.

提交回复
热议问题