How to connect to Amazon Redshift or other DB's in Apache Spark?

后端 未结 6 2098
刺人心
刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答
  •  青春惊慌失措
    2021-01-13 10:07

    It turns out you only need a username/pwd to access Redshift in Spark, and it is done as follows (using the Python API):

    from pyspark.sql import SQLContext
    sqlContext = SQLContext(sc)
    df = sqlContext.read.load(source="jdbc", 
                         url="jdbc:postgresql://host:port/dbserver?user=yourusername&password=secret", 
                         dbtable="schema.table"
    )
    

    Hope this helps someone!

提交回复
热议问题