问题
I'm trying to add a JDBC driver to a Spark cluster that is executing on top Amazon EMR but I keep getting the:
java.sql.SQLException: No suitable driver found for exception.
I tried the following things:
- Use addJar to add the driver Jar explicitly from the code.
- Using spark.executor.extraClassPath spark.driver.extraClassPath parameters.
- Using spark.driver.userClassPathFirst=true, when I used this option I'm getting a different error because mix of dependencies with Spark, Anyway this option seems to be to aggressive if I just want to add a single JAR.
Could you please help me with that,how can I introduce the driver to the Spark cluster easily?
Thanks,
David
Source code of the application
val properties = new Properties()
properties.put("ssl", "***")
properties.put("user", "***")
properties.put("password", "***")
properties.put("account", "***")
properties.put("db", "***")
properties.put("schema", "***")
properties.put("driver", "***")
val conf = new SparkConf().setAppName("***")
.setMaster("yarn-cluster")
.setJars(JavaSparkContext.jarOfClass(this.getClass()))
val sc = new SparkContext(conf)
sc.addJar(args(0))
val sqlContext = new SQLContext(sc)
var df = sqlContext.read.jdbc(connectStr, "***", properties = properties)
df = df.select( Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***)
// Additional actions on df
回答1:
I had the same problem. What ended working for me is to use the --driver-class-path parameter used with spark-submit.
The main thing is to add the entire spark class path to the --driver-class-path
Here are my steps:
- I got the default driver class path by getting the value of the "spark.driver.extraClassPath" property from the Spark History Server under "Environment".
- Copied the MySQL JAR file to each node in the EMR cluster.
- Put the MySQL jar path at the front of the --driver-class-path argument to the spark-submit command and append the value of "spark.driver.extraClassPath" to it
My driver class path ended up looking like this:
--driver-class-path /home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*
This worked with EMR 4.1 using Java with Spark 1.5.0. I had already added the MySQL JAR as a dependency in the Maven pom.xml
You may also want to look at this answer as it seems like a cleaner solution. I haven't tried it myself.
回答2:
With EMR 5.2 I add any new jars to the original driver classpath
with:
export MY_DRIVER_CLASS_PATH=my_jdbc_jar.jar:some_other_jar.jar$(grep spark.driver.extraClassPath /etc/spark/conf/spark-defaults.conf | awk '{print $2}')
and after that
spark-submit --driver-class-path $MY_DRIVER_CLASS_PATH
来源:https://stackoverflow.com/questions/32756335/adding-jdbc-driver-to-spark-on-emr