“No suitable driver” error when using sparklyr::spark_read_jdbc to query Azure SQL database from Azure Databricks

问题

I am trying to connect to an Azure SQL DB from a Databricks notebook using the sparklyr::spark_read_jdbc function. I am an analyst with no computer science background (beyond R and SQL) or previous experience using Spark or jdbc (I have previously used local instances of R to connect to the same SQL database via odbc), so I apologise if I've misunderstood something vital.

My code is:

sc <- spark_connect(method = "databricks")

library(sparklyr)
library(dplyr)

config <- spark_config()

db_tbl <- sc %>%
  spark_read_jdbc(sc,
              name    = "myresults",  
              options = list(url      = "mysqlserver.database.windows.net",
                             user     = "adminuser",
                             password = "adminpassword",
                             dbtable  = "(SELECT 
[Post_Sector]
,[People]
,[Field_1]
,[Field_2]
,[Field_3]
FROM [myschema].[mytable]) as my_query"))

Which results in the error:

Error : java.sql.SQLException: No suitable driver at java.sql.DriverManager.getDriver(DriverManager.java:315) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105) at scala.Option.getOrElse(Option.scala:121)

(I have truncated the list of locations at which a suitable driver can't be found as it is very long.)

I have installed "azure_sqldb_spark_1_0_2_jar_with_dependencies.jar" and "sqljdbc42.jar" to the cluster, as well as the Maven library "com.microsoft.azure:azure-sqldb-spark:1.0.2".

I have also tried specifying the driver location as such:

 config$`sparklyr.shell.driver-class-path` <- "dbfs:/FileStore/jars/3db936ce_5bda_4344_b102_32c0dcae2f87-azure_sqldb_spark_1_0_2_jar_with_dependencies-7114d.jar"

but this does not prevent the error message.

I can connect to the database fine using:

%scala

//Connect to database:

import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._

// Aquire a DataFrame collection (val collection)

val my_connection = Config(Map(
  "url"          -> "mysqlserver.database.windows.net",
  "databaseName" -> "mydatabase",
  "dbTable"      -> "mytable",
  "user"         -> "adminuser",
  "password"     -> "adminpassword"
))

val collection = sqlContext.read.sqlDB(my_connection)
collection.show()

So the connection credentials are not the problem. However I don't know how to use sparklyr to access this connection and run SQL queries against it to generate R dataframes (or spark dataframes that I can convert to R dataframes), so I'm still hoping I can get the connection via sparklyr to work.

Thanks in advance for any advice!

来源：https://stackoverflow.com/questions/56756844/no-suitable-driver-error-when-using-sparklyrspark-read-jdbc-to-query-azure-s

标签

jdbc

azure-sql-database

sparklyr

azure-databricks