What's the difference between --archives, --files, py-files in pyspark job arguments
问题 --archives , --files , --py-files and sc.addFile and sc.addPyFile are quite confusing, can someone explain these clearly? 回答1: These options are truly scattered all over the place. In general, add your data files via --files or --archives and code files via --py-files . The latter will be added to the classpath (c.f., here) so you could import and use. As you can imagine, the CLI arguments is actually dealt with by addFile and addPyFiles functions (c.f., here) From http://spark.apache.org