Spark-sqlserver connection

拟墨画扇 提交于 2019-12-11 01:37:03

问题


Can we connect spark with sql-server? If so, how? I am new to spark, I want to connect the server to spark and work directly from sql-server instead of uploading .txt or .csv file. Please help, Thank you.


回答1:


Here are some code snippets. A DataFrame is used to create the table t2 and insert data. The SqlContext is used to load the data from the t2 table into a DataFrame. I added the spark.driver.extraClassPath and spark.executor.extraClassPath to my spark-default.conf file.

//Spark 1.4.1

//Insert data from DataFrame

case class Conf(mykey: String, myvalue: String)

val data = sc.parallelize( Seq(Conf("1", "Delaware"), Conf("2", "Virginia"), Conf("3", "Maryland"), Conf("4", "South Carolina") ))

val df = data.toDF()

val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"

val table = "t2"

df.insertIntoJDBC(url, table, true)

//Load from database using SqlContext

val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"

val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";

val tbl = { sqlContext.load("jdbc", Map( "url" -> url, "driver" -> driver, "dbtable" -> "t2", "partitionColumn" -> "mykey", "lowerBound" -> "0", "upperBound" -> "100", "numPartitions" -> "1" ))}

tbl.show()

Some issue to consider are:

Make sure firewall ports are open for port 1433. If using Microsoft Azure SQL Server DB, tables require a primary key. Some of the methods create the table, but Spark's code is not creating the primary key so the table creation fails.

Other details to take care: https://docs.databricks.com/spark/latest/data-sources/sql-databases.html

source: https://blogs.msdn.microsoft.com/bigdatasupport/2015/10/22/how-to-allow-spark-to-access-microsoft-sql-server/




回答2:


// Spark 2.x
import org.apache.spark.SparkContext

// Create dataframe on top of SQLServer database table
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val jdbcDF = sqlContext.read.format("jdbc").option("driver" , "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
           .option("url", "jdbc:sqlserver://XXXXX.com:port;databaseName=xxx") \
           .option("dbtable", "(SELECT * FROM xxxx) tmp") \
           .option("user", "xxx") \
           .option("password", "xxx") \
           .load()

// show sample records from data frame

jdbcDF.show(5)


来源:https://stackoverflow.com/questions/48295493/spark-sqlserver-connection

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!