I\'m trying to connect Spark with amazon Redshift but i\'m getting this error :
My code is as follow :
from pyspark.sql import SQLContext
f
The error is due to missing dependencies.
Verify that you have these jar files in the spark home directory:
Put these jar files in $SPARK_HOME/jars/ and then start spark
pyspark --jars $SPARK_HOME/jars/spark-redshift_2.10-3.0.0-preview1.jar,$SPARK_HOME/jars/RedshiftJDBC41-1.1.10.1010.jar,$SPARK_HOME/jars/hadoop-aws-2.7.1.jar,$SPARK_HOME/jars/aws-java-sdk-s3-1.11.60.jar,$SPARK_HOME/jars/aws-java-sdk-1.7.4.jar
(SPARK_HOME should be = "/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec")
This will run Spark with all necessary dependencies. Note that you also need to specify the authentication type 'forward_spark_s3_credentials'=True if you are using awsAccessKeys.
from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext(appName="Connect Spark with Redshift")
sql_context = SQLContext(sc)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", )
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", )
df = sql_context.read \
.format("com.databricks.spark.redshift") \
.option("url", "jdbc:redshift://example.coyf2i236wts.eu-central- 1.redshift.amazonaws.com:5439/agcdb?user=user&password=pwd") \
.option("dbtable", "table_name") \
.option('forward_spark_s3_credentials',True) \
.option("tempdir", "s3n://bucket") \
.load()
Common errors afterwards are:
.option("url", "jdbc:redshift://example.coyf2i236wts.eu-central- 1.redshift.amazonaws.com:5439/agcdb?user=user&password=pwd?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory")