问题
I am struggling to load classes from JARs into my Scala-Spark kernel Jupyter notebook. I have jars at this location:
/home/hadoop/src/main/scala/com/linkedin/relevance/isolationforest/
with contents listed as follows:
-rwx------ 1 hadoop hadoop 7170 Sep 11 20:54 BaggedPoint.scala
-rw-rw-r-- 1 hadoop hadoop 186719 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1.jar
-rw-rw-r-- 1 hadoop hadoop 1482 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1-javadoc.jar
-rw-rw-r-- 1 hadoop hadoop 20252 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1-sources.jar
-rwx------ 1 hadoop hadoop 16133 Sep 11 20:54 IsolationForestModelReadWrite.scala
-rwx------ 1 hadoop hadoop 5740 Sep 11 20:54 IsolationForestModel.scala
-rwx------ 1 hadoop hadoop 4057 Sep 11 20:54 IsolationForestParams.scala
-rwx------ 1 hadoop hadoop 11301 Sep 11 20:54 IsolationForest.scala
-rwx------ 1 hadoop hadoop 7990 Sep 11 20:54 IsolationTree.scala
drwxrwxr-x 2 hadoop hadoop 157 Sep 11 21:35 libs
-rwx------ 1 hadoop hadoop 1731 Sep 11 20:54 Nodes.scala
-rwx------ 1 hadoop hadoop 854 Sep 11 20:54 Utils.scala
When I attempt to load the IsolationForest class like so:
import com.linkedin.relevance.isolationforest.IsolationForest
I get the following error in my notebook:
<console>:33: error: object linkedin is not a member of package com
import com.linkedin.relevance.isolationforest.IsolationForest
I've been Googling for several hours now to get to this point but am unable to progress further. What is the next step?
By the way, I am attempting to use this package: https://github.com/linkedin/isolation-forest
Thank you.
回答1:
For Scala:
if you're using spylon-kernel, then you can specify additional jars in the %%init_spark
section, as described in the docs (first is for jar file, second is for package, as described below):
%%init_spark
launcher.jars = ["/some/local/path/to/a/file.jar"]
launcher.packages = ["com.acme:super:1.0.1"]
For Python:
in the first cells of Jupyter notebook, before initializing the SparkSession
, do the following:
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars <full_path_to>/isolation-forest_2.3.0_2.11-1.0.1.jar pyspark-shell'
this will add the jars into the PySpark context. But it's better to use --packages
instead of --jars
because it will also fetch all necessary dependencies, and put everything into the internal cache. For example
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.linkedin.isolation-forest:isolation-forest_2.3.0_2.11:1.0.0 pyspark-shell'
You only need to select version that matches your PySpark and Scala version (2.3.x & 2.4 are Scala 2.11, 3.0 is Scala 2.12), as it's listed in the Git repo.
回答2:
I got the following to work with pure Scala, Jupyter Lab, and Almond, which uses Ammonite, no Spark or any other heavy overlay involved:
interp.load.cp (os.pwd/"yourfile.jar")
The above, added as a statement in the notebook directly, loads yourfile.jar
from the current directory. After this you can import from the jar. For instance, import yourfile._
, if yourfile
is the name of the top level package. I observed one caveat that one should wait a bit, until the kernel starts properly, before attempting to load. If the first statement is run too fast (for instance with restart and run all) then the whol thing hangs. This seems to be an unrelated issue.
You can, of course, construct another path (look over here for the available API). Also under the ammonite magic imports link from above you will find info how to load a package from ivy or how to load a Scala script as well. The trick is to use the interp
object and the LoadJar
trait that you can access from it. LoadJar
has the following API:
trait LoadJar {
/**
* Load a `.jar` file or directory into your JVM classpath
*/
def cp(jar: os.Path): Unit
/**
* Load a `.jar` from a URL into your JVM classpath
*/
def cp(jar: java.net.URL): Unit
/**
* Load one or more `.jar` files or directories into your JVM classpath
*/
def cp(jars: Seq[os.Path]): Unit
/**
* Load a library from its maven/ivy coordinates
*/
def ivy(coordinates: Dependency*): Unit
}
来源:https://stackoverflow.com/questions/63854636/how-do-i-import-classes-from-one-or-more-local-jar-files-into-a-spark-scala-not