How to connect to Amazon Redshift or other DB's in Apache Spark?

后端 未结 6 2076
刺人心
刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答
  •  [愿得一人]
    2021-01-13 10:05

    This worked for in Scala in AWS Glue with Spark 2.4:

    val spark: SparkContext = new SparkContext()
    val glueContext: GlueContext = new GlueContext(spark)
    Job.init(args("JOB_NAME"), glueContext, args.asJava)
    
    val sqlContext = new org.apache.spark.sql.SQLContext(spark)
    val jdbcDF = sqlContext.read.format("jdbc").options(
      Map("url" -> "jdbc:postgresql://HOST:PORT/DBNAME?user=USERNAME&password=PASSWORD",
      "dbtable" -> "(SELECT a.row_name FROM schema_name.table_name a) as from_redshift")).load()
    
    // back to DynamicFrame
    val datasource0 = DynamicFrame(jdbcDF, glueContext)
    

    Works with any SQL query.

提交回复
热议问题