问题
In spark job, I don't know how to import and use the jars that is shared by method SparkContext.addJar(). It seems that this method is able to move jars into some place that are accessible by other nodes in the cluster, but I do not know how to import them.
This is an example:
package utils;
public class addNumber {
public int addOne(int i){
return i + 1;
}
public int addTwo(int i){
return i + 2;
}
}
I create a class called addNumber and make it into a jar file utils.jar.
Then I create a spark job and codes are shown below:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object TestDependencies {
def main(args:Array[String]): Unit = {
val sparkConf = new SparkConf
val sc = new SparkContext(sparkConf)
sc.addJar("/path/to//utils.jar")
val data = 1 to 100 toList
val rdd = sc.makeRDD(data)
val rdd_1 = rdd.map ( x => {
val handler = new utils.addNumber
handler.addOne(x)
} )
rdd_1.collect().foreach { x => print(x + "||") }
}
}
The error "java.lang.NoClassDefFoundError: utils/addNumber" raised after submission of the job through command "spark-submit".
I know that method addJar() does not guarantee jars included into class path of the spark job. If I want to use the jar files I have move all of dependencies to the same path in each node of cluster. But if I can move and include all of the jars, what is the use of method addJar()?
I am wondering if there is a way using jars imported by method addJar(). Thanks in advance.
回答1:
Did you try set the path of jar with prefix "local"? From documentation:
public void addJar(String path)
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
You can try this way as well:
val conf = new SparkConf()
.setMaster('local[*]')
.setAppName('tmp')
.setJars(Array('/path1/one.jar', '/path2/two.jar'))
val sc = new SparkContext(conf)
and take a look here, check spark.jars option
and set "--jars" param in spark-submit:
--jars /path/1.jar,/path/2.jar
or edit conf/spark-defaults.conf:
spark.driver.extraClassPath /path/1.jar:/fullpath/2.jar
spark.executor.extraClassPath /path/1.jar:/fullpath/2.jar
来源:https://stackoverflow.com/questions/42957696/what-is-use-of-method-addjar-in-spark