Adding postgresql jar though spark-submit on amazon EMR

别来无恙 提交于 2019-12-09 13:54:30

问题


I've tried spark-submit with --driver-class-path, with --jars as well as tried this method https://petz2000.wordpress.com/2015/08/18/get-blas-working-with-spark-on-amazon-emr/

On using SPARK_CLASSPATH in the commandline as in

SPARK_CLASSPATH=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar pyspark

I get this error

Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.

But I'm not able to add it. How do I add postgresql JDBC jar file to use it from pyspark? I'm using EMR version 4.2

Thanks


回答1:


1) Clear environment variable:

unset SPARK_CLASSPATH

2) Use --jars option to distribute postgres driver over your cluster:

pyspark --jars=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar
//or
spark-submit --jars=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar <your py script or app jar>



回答2:


Adding the jar path to /etc/spark/conf/spark-defaults.conf at the spark.driver.extraClassPath row solved my issue.



来源:https://stackoverflow.com/questions/37130780/adding-postgresql-jar-though-spark-submit-on-amazon-emr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!