ImportError: No module named numpy on spark workers

后端 未结 6 1237
抹茶落季
抹茶落季 2020-12-05 03:12

Launching pyspark in client mode. bin/pyspark --master yarn-client --num-executors 60 The import numpy on the shell goes fine but it fails in the kmeans. Someho

6条回答
  •  余生分开走
    2020-12-05 03:48

    What solved it for me (On mac) was actually this guide (Which also explains how to run python through Jupyter Notebooks - https://medium.com/@yajieli/installing-spark-pyspark-on-mac-and-fix-of-some-common-errors-355a9050f735

    In a nutshell: (Assuming you installed spark with brew install spark)

    1. Find the SPARK_PATH using - brew info apache-spark
    2. Add those lines to your ~/.bash_profile
    # Spark and Python
    ######
    export SPARK_PATH=/usr/local/Cellar/apache-spark/2.4.1
    export PYSPARK_DRIVER_PYTHON="jupyter"
    export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
    #For python 3, You have to add the line below or you will get an error
    export PYSPARK_PYTHON=python3
    alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]'
    ######
    
    1. You should be able to open Jupyter Notebook simply by calling: pyspark

    And just remember you don't need to set the Spark Context but instead simply call:

    sc = SparkContext.getOrCreate()
    

提交回复
热议问题