Running custom Java class in PySpark

前端 未结 3 740
野的像风
野的像风 2020-12-05 16:58

I\'m trying to run a custom HDFS reader class in PySpark. This class is written in Java and I need to access it from PySpark, either from the shell or with spark-submit.

3条回答
  •  不思量自难忘°
    2020-12-05 17:25

    In PySpark try the following

    from py4j.java_gateway import java_import
    java_import(sc._gateway.jvm,"org.foo.module.Foo")
    
    func = sc._gateway.jvm.Foo()
    func.fooMethod()
    

    Make sure that you have compiled your Java code into a runnable jar and submit the spark job like so

    spark-submit --driver-class-path "name_of_your_jar_file.jar" --jars "name_of_your_jar_file.jar" name_of_your_python_file.py
    

提交回复
热议问题