I\'m trying to use spark-submit
to execute my python code in spark cluster.
Generally we run spark-submit
with python code like below.
Ah, it's possible. http://caen.github.io/hadoop/user-spark.html
spark-submit \
--master yarn-client \ # Run this as a Hadoop job
--queue \ # Run on your_queue
--num-executors 10 \ # Run with a certain number of executors, for example 10
--executor-memory 12g \ # Specify each executor's memory, for example 12GB
--executor-cores 2 \ # Specify each executor's amount of CPUs, for example 2
job.py ngrams/input ngrams/output