ImportError: No module named numpy on spark workers

后端 未结 6 1224
抹茶落季
抹茶落季 2020-12-05 03:12

Launching pyspark in client mode. bin/pyspark --master yarn-client --num-executors 60 The import numpy on the shell goes fine but it fails in the kmeans. Someho

6条回答
  •  無奈伤痛
    2020-12-05 03:47

    To use Spark in Yarn client mode, you'll need to install any dependencies to the machines on which Yarn starts the executors. That's the only surefire way to make this work.

    Using Spark with Yarn cluster mode is a different story. You can distribute python dependencies with spark-submit.

    spark-submit --master yarn-cluster my_script.py --py-files my_dependency.zip
    

    However, the situation with numpy is complicated by the same thing that makes it so fast: the fact that does the heavy lifting in C. Because of the way that it is installed, you won't be able to distribute numpy in this fashion.

提交回复
热议问题