I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab
This worked for in Scala in AWS Glue with Spark 2.4:
val spark: SparkContext = new SparkContext()
val glueContext: GlueContext = new GlueContext(spark)
Job.init(args("JOB_NAME"), glueContext, args.asJava)
val sqlContext = new org.apache.spark.sql.SQLContext(spark)
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:postgresql://HOST:PORT/DBNAME?user=USERNAME&password=PASSWORD",
"dbtable" -> "(SELECT a.row_name FROM schema_name.table_name a) as from_redshift")).load()
// back to DynamicFrame
val datasource0 = DynamicFrame(jdbcDF, glueContext)
Works with any SQL query.