Spark packages flag vs jars dir?

别来无恙 提交于 2021-02-09 02:48:37

问题


In Spark, what's the difference between adding JARs to the classpath via --packages argument and just adding the JARs directly to the $SPARK_HOME/jars directory?


回答1:


TL;DR jars are used for local or remote jar files specified with URL and dont resolve dependencies, packages are used for Maven coordinates, and do resolve dependencies. From docs

  • --jars

    When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. Directory expansion does not work with --jars.

  • --packages

    Users may also include any other dependencies by supplying a comma-delimited list of Maven coordinates with --packages. All transitive dependencies will be handled when using this command. Additional repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag --repositories. (Note that credentials for password-protected repositories can be supplied in some cases in the repository URI, such as in https://user:password@host/.... Be careful when supplying credentials this way.) These commands can be used with pyspark, spark-shell, and spark-submit to include Spark Packages.



来源:https://stackoverflow.com/questions/50333750/spark-packages-flag-vs-jars-dir

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!