问题
I am trying to connect to an Azure SQL DB from a Databricks notebook using the sparklyr::spark_read_jdbc function. I am an analyst with no computer science background (beyond R and SQL) or previous experience using Spark or jdbc (I have previously used local instances of R to connect to the same SQL database via odbc), so I apologise if I've misunderstood something vital.
My code is:
sc <- spark_connect(method = "databricks")
library(sparklyr)
library(dplyr)
config <- spark_config()
db_tbl <- sc %>%
spark_read_jdbc(sc,
name = "myresults",
options = list(url = "mysqlserver.database.windows.net",
user = "adminuser",
password = "adminpassword",
dbtable = "(SELECT
[Post_Sector]
,[People]
,[Field_1]
,[Field_2]
,[Field_3]
FROM [myschema].[mytable]) as my_query"))
Which results in the error:
Error : java.sql.SQLException: No suitable driver at java.sql.DriverManager.getDriver(DriverManager.java:315) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105) at scala.Option.getOrElse(Option.scala:121)
(I have truncated the list of locations at which a suitable driver can't be found as it is very long.)
I have installed "azure_sqldb_spark_1_0_2_jar_with_dependencies.jar" and "sqljdbc42.jar" to the cluster, as well as the Maven library "com.microsoft.azure:azure-sqldb-spark:1.0.2".
I have also tried specifying the driver location as such:
config$`sparklyr.shell.driver-class-path` <- "dbfs:/FileStore/jars/3db936ce_5bda_4344_b102_32c0dcae2f87-azure_sqldb_spark_1_0_2_jar_with_dependencies-7114d.jar"
but this does not prevent the error message.
I can connect to the database fine using:
%scala
//Connect to database:
import com.microsoft.azure.sqldb.spark.config.Config
import com.microsoft.azure.sqldb.spark.connect._
// Aquire a DataFrame collection (val collection)
val my_connection = Config(Map(
"url" -> "mysqlserver.database.windows.net",
"databaseName" -> "mydatabase",
"dbTable" -> "mytable",
"user" -> "adminuser",
"password" -> "adminpassword"
))
val collection = sqlContext.read.sqlDB(my_connection)
collection.show()
So the connection credentials are not the problem. However I don't know how to use sparklyr to access this connection and run SQL queries against it to generate R dataframes (or spark dataframes that I can convert to R dataframes), so I'm still hoping I can get the connection via sparklyr to work.
Thanks in advance for any advice!
来源:https://stackoverflow.com/questions/56756844/no-suitable-driver-error-when-using-sparklyrspark-read-jdbc-to-query-azure-s