how to use Spark-submit configuration: jars,packages:in cluster mode?

眉间皱痕 提交于 2019-12-25 07:26:50

问题


When use Spark-submit in cluster mode(yarn-cluster),jars and packages configuration confused me: for jars, i can put them in HDFS, instead of in local directory . But for packages, because they build with Maven, with HDFS,it can't work. my way like below:

spark-submit --jars hdfs:///mysql-connector-java-5.1.39-bin.jar --driver-class-path /home/liac/test/mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar --conf "spark.mongodb.input.uri=mongodb://192.168.27.234/test.myCollection2?readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://192.168.27.234/test.myCollection2"  --packages com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0 --py-files /home/liac/code/diagnose_disease/tool.zip main_disease_tag_spark.py --master yarn-client

error occur:

`Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0

Anyone can tell me how to use jars and packages in cluster mode? and what's wrong with my way?


回答1:


Your use of the --packages argument is wrong:

--packages com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0

It needs to be in the form of groupId:artifactId:version as the output suggests. You cannot use a URL with it.

An example for using mongoDB with spark with the built-in repository support:

$SPARK_HOME/bin/spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.11:1.0.0

If you insist on using your own jar you can provide it via --repositories. The value of the argument is

Comma-separated list of remote repositories to search for the Maven coordinates specified in packages.

For example, in your case, it could be

--repositories hdfs:///user/liac/package/jars/ --packages org.mongodb.spark:mongo-spark-connector_2.11:1.0.0


来源:https://stackoverflow.com/questions/38912540/how-to-use-spark-submit-configuration-jars-packagesin-cluster-mode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!