Using all resources in Apache Spark with Yarn

随声附和 提交于 2019-12-11 19:33:47

问题


I am using Apache Spark with Yarn client. I have 4 worker PCs with 8 vcpus each and 30 GB of ram in my spark cluster. Im set my executor memory to 2G and number of instances to 33. My job is taking 10 hours to run and all machines are about 80% idle.

I dont understand the correlation between executor memory and executor instances. Should I have an instance per Vcpu? Should I set the executor memory to be memory of machine/#executors per machine?


回答1:


I believe that you have to use the following command:

spark-submit --num-executors 4 --executor-memory 7G --driver-memory 2G --executor-cores 8 --class \"YourClassName\" --master yarn-client

Number of executors should be 4, since you have 4 workers. The executor memory should be close to the maximum memory that each yarn node has allocated, roughly ~5-6GB (I assume you have 30GB total RAM).

You should take a look on the spark-submit parameters and fully understand them.




回答2:


We were using cassandra as our data source for spark. The problem was there were not enough partitions. We needed to split up the data more. Our mapping for # of cassandra partitions to spark partitions was not small enough and we would only generate 10 or 20 tasks instead of 100s of tasks.



来源:https://stackoverflow.com/questions/30457314/using-all-resources-in-apache-spark-with-yarn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!