How to add third party java jars for use in pyspark

前端 未结 9 1854
没有蜡笔的小新
没有蜡笔的小新 2020-11-29 03:08

I have some third party Database client libraries in Java. I want to access them through

java_gateway.py

E.g: to make the client class (not

9条回答
  •  感动是毒
    2020-11-29 03:36

    java/scala libs from pyspark both --jars and spark.jars are not working in version 2.4.0 and earlier (I didn't check newer version). I'm surprised how many guys are claiming that it is working.

    The main problem is that for classloader retrieved in following way:

    jvm = SparkSession.builder.getOrCreate()._jvm
    clazz = jvm.my.scala.class
    # or
    clazz = jvm.java.lang.Class.forName('my.scala.class')
    

    it works only when you copy jar files to ${SPARK_HOME}/jars (this one works for me).

    But when your only way is using --jars or spark.jars there is another classloader used (which is child class loader) which is set in current thread. So your python code needs to look like:

    clazz = jvm.java.lang.Thread.currentThread().getContextClassLoader().loadClass(f"{object_name}$")
    

    Hope it explains your troubles. Give me a shout if not.

提交回复
热议问题