问题
I've tried spark-submit with --driver-class-path, with --jars as well as tried this method https://petz2000.wordpress.com/2015/08/18/get-blas-working-with-spark-on-amazon-emr/
On using SPARK_CLASSPATH in the commandline as in
SPARK_CLASSPATH=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar pyspark
I get this error
Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
But I'm not able to add it. How do I add postgresql JDBC jar file to use it from pyspark? I'm using EMR version 4.2
Thanks
回答1:
1) Clear environment variable:
unset SPARK_CLASSPATH
2) Use --jars option to distribute postgres driver over your cluster:
pyspark --jars=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar
//or
spark-submit --jars=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar <your py script or app jar>
回答2:
Adding the jar path to /etc/spark/conf/spark-defaults.conf
at the spark.driver.extraClassPath
row solved my issue.
来源:https://stackoverflow.com/questions/37130780/adding-postgresql-jar-though-spark-submit-on-amazon-emr