Launching pyspark in client mode. bin/pyspark --master yarn-client --num-executors 60 The import numpy on the shell goes fine but it fails in the kmeans. Someho
What solved it for me (On mac) was actually this guide (Which also explains how to run python through Jupyter Notebooks -
https://medium.com/@yajieli/installing-spark-pyspark-on-mac-and-fix-of-some-common-errors-355a9050f735
In a nutshell:
(Assuming you installed spark with brew install spark)
SPARK_PATH using - brew info apache-spark~/.bash_profile# Spark and Python
######
export SPARK_PATH=/usr/local/Cellar/apache-spark/2.4.1
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
#For python 3, You have to add the line below or you will get an error
export PYSPARK_PYTHON=python3
alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]'
######
Jupyter Notebook simply by calling:
pysparkAnd just remember you don't need to set the Spark Context but instead simply call:
sc = SparkContext.getOrCreate()