Launching pyspark in client mode. bin/pyspark --master yarn-client --num-executors 60
The import numpy on the shell goes fine but it fails in the kmeans. Someho
You have to be aware that you need to have numpy installed on each and every worker, and even the master itself (depending on your component placement)
Also ensure to launch pip install numpy
command from a root account (sudo does not suffice) after forcing umask to 022 (umask 022
) so it cascades the rights to Spark (or Zeppelin) User