How to connect to Amazon Redshift or other DB's in Apache Spark?

后端未结

关注

 6  2076

刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答

[愿得一人] (楼主)

2021-01-13 10:05

This worked for in Scala in AWS Glue with Spark 2.4:

val spark: SparkContext = new SparkContext()
val glueContext: GlueContext = new GlueContext(spark)
Job.init(args("JOB_NAME"), glueContext, args.asJava)

val sqlContext = new org.apache.spark.sql.SQLContext(spark)
val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql://HOST:PORT/DBNAME?user=USERNAME&password=PASSWORD",
  "dbtable" -> "(SELECT a.row_name FROM schema_name.table_name a) as from_redshift")).load()

// back to DynamicFrame
val datasource0 = DynamicFrame(jdbcDF, glueContext)

Works with any SQL query.

0 讨论(0)

查看其它6个回答