Pyspark: run a script from inside the archive
问题 I have an archive (basically a bundled conda environment + my application) which I can easily use with pyspark in yarn master mode: PYSPARK_PYTHON=./pkg/venv/bin/python3 \ spark-submit \ --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pkg/venv/bin/python3 \ --master yarn \ --deploy-mode cluster \ --archives hdfs:///package.tgz#pkg \ app/MyScript.py This works as expected, no surprise here. Now how could I run this if MyScript.py is inside package.tgz. not on my local filesystem? I would like