问题
In Spark, what's the difference between adding JARs to the classpath via --packages argument and just adding the JARs directly to the $SPARK_HOME/jars directory?
回答1:
TL;DR jars are used for local or remote jar files specified with URL and dont resolve dependencies, packages are used for Maven coordinates, and do resolve dependencies. From docs
--jarsWhen using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. Directory expansion does not work with --jars.
--packagesUsers may also include any other dependencies by supplying a comma-delimited list of Maven coordinates with --packages. All transitive dependencies will be handled when using this command. Additional repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag --repositories. (Note that credentials for password-protected repositories can be supplied in some cases in the repository URI, such as in https://user:password@host/.... Be careful when supplying credentials this way.) These commands can be used with pyspark, spark-shell, and spark-submit to include Spark Packages.
来源:https://stackoverflow.com/questions/50333750/spark-packages-flag-vs-jars-dir