How to add third party java jars for use in pyspark

前端 未结 9 1849
没有蜡笔的小新
没有蜡笔的小新 2020-11-29 03:08

I have some third party Database client libraries in Java. I want to access them through

java_gateway.py

E.g: to make the client class (not

9条回答
  •  被撕碎了的回忆
    2020-11-29 03:29

    All the above answers did not work for me

    What I had to do with pyspark was

    pyspark --py-files /path/to/jar/xxxx.jar
    

    For Jupyter Notebook:

    spark = (SparkSession
        .builder
        .appName("Spark_Test")
        .master('yarn-client')
        .config("spark.sql.warehouse.dir", "/user/hive/warehouse")
        .config("spark.executor.cores", "4")
        .config("spark.executor.instances", "2")
        .config("spark.sql.shuffle.partitions","8")
        .enableHiveSupport()
        .getOrCreate())
    
    # Do this 
    
    spark.sparkContext.addPyFile("/path/to/jar/xxxx.jar")
    

    Link to the source where I found it: https://github.com/graphframes/graphframes/issues/104

提交回复
热议问题