How to correctly set python version in Spark?

a 夏天 提交于 2020-05-17 07:52:17

问题


My spark version is 2.4.0, it has python2.7 and python 3.7 . The default version is python2.7. Now I want to submit a pyspark program which uses python3.7. I tried two ways, but both of them don't work.

  1. spark2-submit --master yarn \ 
    --conf "spark.pyspark.python=/usr/bin/python3" \
    --conf "spark.pyspark.driver.python=/usr/bin/python3"   pi.py
    

    It doesn't work and says

    Cannot run program "/usr/bin/python3": error=13, Permission denied
    

    But actually, I have the permission, for example, I can use /usr/bin/python3 test.py to run a python program.

  2. export PYSPARK_PYTHON=/usr/bin/python3
    export PYSPARK_DRIVER_PYTHON=/usr/bin/python3
    

    In this way, spark can't use python3 at all.


回答1:


From my experience, I found that including the spark location in the python script tends to be much easier, for this use findspark.

import findspark
spark_location='/opt/spark-2.4.3/' # Set your own
findspark.init(spark_home=spark_location) 



回答2:


I encountered the same problem.

Solution of configuring the env in the beginning of the script (in Spark not executing tasks) did not work for me.

Without restarting the cluster, just executing the command below worked for me.

sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh


来源:https://stackoverflow.com/questions/57953227/how-to-correctly-set-python-version-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!