Pyspark command not recognised

孤人 提交于 2021-02-19 01:17:39

问题


I have anaconda installed and also I have downloaded Spark 1.6.2. I am using the following instructions from this answer to configure spark for Jupyter enter link description here

I have downloaded and unzipped the spark directory as

~/spark

Now when I cd into this directory and into bin I see the following

SFOM00618927A:spark $ cd bin
SFOM00618927A:bin $ ls
beeline         pyspark         run-example.cmd     spark-class2.cmd    spark-sql       sparkR
beeline.cmd     pyspark.cmd     run-example2.cmd    spark-shell     spark-submit        sparkR.cmd
load-spark-env.cmd  pyspark2.cmd        spark-class     spark-shell.cmd     spark-submit.cmd    sparkR2.cmd
load-spark-env.sh   run-example     spark-class.cmd     spark-shell2.cmd    spark-submit2.cmd

I have also added the environment variables as mentioned in the above answer to my .bash_profile and .profile

Now in the spark/bin directory first thing I want to check is if pyspark command works on shell first.

So I do this after doing cd spark/bin

SFOM00618927A:bin $ pyspark
-bash: pyspark: command not found

As per the answer after following all the steps I can just do

pyspark 

in terminal in any directory and it should start a jupyter notebook with spark engine. But even the pyspark within the shell is not working forget about making it run on juypter notebook

Please advise what is going wrong here.

Edit:

I did

open .profile 

at home directory and this is what is stored in the path.

export PATH=/Users/854319/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/Users/854319/spark/bin
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark

回答1:


1- You need to set JAVA_HOME and spark paths for the shell to find them. After setting them in your .profile you may want to

source ~/.profile

to activate the setting in the current session. From your comment I can see you're already having the JAVA_HOME issue.

Note if you have .bash_profile or .bash_login, .profile will not work as described here

2- When you are in spark/bin you need to run

./pyspark

to tell the shell that the target is in the current folder.




回答2:


Here's my environment vars, hope it will help you:

# path to JAVA_HOME
export JAVA_HOME=$(/usr/libexec/java_home)

#Spark
export SPARK_HOME="/usr/local/spark" #version 1.6
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_SUBMIT_ARGS="--master local[2]"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

^^ Remove the Pyspark_driver_python_opts option if you don't want the notebook to launch, otherwise you can leave this out entirely and use it on your command line when you need it.

I have anaconda vars in another line to append to the PATH.




回答3:


For anyone who came here during or after MacOS Catalina, make sure you're establishing/sourcing variables in zshrc and not bash.

$ nano ~/.zshrc

# Set Spark Path
export SPARK_HOME="YOUR_PATH/spark-3.0.1-bin-hadoop2.7"
export PATH="$SPARK_HOME/bin:$PATH"

# Set pyspark + jupyter commands
export PYSPARK_SUBMIT_ARGS="pyspark-shell"
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='lab' pyspark

$ source ~/.zshrc

$ pyspark # Automatically opens Jupyter Lab w/ PySpark initialized.



来源:https://stackoverflow.com/questions/38798816/pyspark-command-not-recognised

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!