Adding JDBC driver to Spark on EMR

吃可爱长大的小学妹 提交于 2019-12-01 06:16:35

问题


I'm trying to add a JDBC driver to a Spark cluster that is executing on top Amazon EMR but I keep getting the:

java.sql.SQLException: No suitable driver found for exception.

I tried the following things:

  1. Use addJar to add the driver Jar explicitly from the code.
  2. Using spark.executor.extraClassPath spark.driver.extraClassPath parameters.
  3. Using spark.driver.userClassPathFirst=true, when I used this option I'm getting a different error because mix of dependencies with Spark, Anyway this option seems to be to aggressive if I just want to add a single JAR.

Could you please help me with that,how can I introduce the driver to the Spark cluster easily?

Thanks,

David

Source code of the application

val properties = new Properties()
properties.put("ssl", "***")
properties.put("user", "***")
properties.put("password", "***")
properties.put("account", "***")
properties.put("db", "***")
properties.put("schema", "***")
properties.put("driver", "***")

val conf = new SparkConf().setAppName("***")
      .setMaster("yarn-cluster")
      .setJars(JavaSparkContext.jarOfClass(this.getClass()))

val sc = new SparkContext(conf)
sc.addJar(args(0))
val sqlContext = new SQLContext(sc)

var df = sqlContext.read.jdbc(connectStr, "***", properties = properties)
df = df.select( Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***)
// Additional actions on df

回答1:


I had the same problem. What ended working for me is to use the --driver-class-path parameter used with spark-submit.

The main thing is to add the entire spark class path to the --driver-class-path

Here are my steps:

  1. I got the default driver class path by getting the value of the "spark.driver.extraClassPath" property from the Spark History Server under "Environment".
  2. Copied the MySQL JAR file to each node in the EMR cluster.
  3. Put the MySQL jar path at the front of the --driver-class-path argument to the spark-submit command and append the value of "spark.driver.extraClassPath" to it

My driver class path ended up looking like this:

--driver-class-path /home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*

This worked with EMR 4.1 using Java with Spark 1.5.0. I had already added the MySQL JAR as a dependency in the Maven pom.xml

You may also want to look at this answer as it seems like a cleaner solution. I haven't tried it myself.




回答2:


With EMR 5.2 I add any new jars to the original driver classpath with:

export MY_DRIVER_CLASS_PATH=my_jdbc_jar.jar:some_other_jar.jar$(grep spark.driver.extraClassPath /etc/spark/conf/spark-defaults.conf | awk '{print $2}')

and after that

spark-submit --driver-class-path $MY_DRIVER_CLASS_PATH


来源:https://stackoverflow.com/questions/32756335/adding-jdbc-driver-to-spark-on-emr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!