I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab
It turns out you only need a username/pwd to access Redshift in Spark, and it is done as follows (using the Python API):
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.load(source="jdbc",
url="jdbc:postgresql://host:port/dbserver?user=yourusername&password=secret",
dbtable="schema.table"
)
Hope this helps someone!