问题
My spark version is 2.4.0, it has python2.7 and python 3.7 . The default version is python2.7. Now I want to submit a pyspark program which uses python3.7. I tried two ways, but both of them don't work.
spark2-submit --master yarn \ --conf "spark.pyspark.python=/usr/bin/python3" \ --conf "spark.pyspark.driver.python=/usr/bin/python3" pi.py
It doesn't work and says
Cannot run program "/usr/bin/python3": error=13, Permission denied
But actually, I have the permission, for example, I can use
/usr/bin/python3 test.py
to run a python program.export PYSPARK_PYTHON=/usr/bin/python3 export PYSPARK_DRIVER_PYTHON=/usr/bin/python3
In this way, spark can't use python3 at all.
回答1:
From my experience, I found that including the spark location in the python script tends to be much easier, for this use findspark
.
import findspark
spark_location='/opt/spark-2.4.3/' # Set your own
findspark.init(spark_home=spark_location)
回答2:
I encountered the same problem.
Solution of configuring the env in the beginning of the script (in Spark not executing tasks) did not work for me.
Without restarting the cluster, just executing the command below worked for me.
sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh
来源:https://stackoverflow.com/questions/57953227/how-to-correctly-set-python-version-in-spark