问题
Whenever I add more than 10 executors my jobs start to become a lot slower. Greater than 15 executors and my jobs start to crash. I generally use 4 cores per executor but have tried 2-5. I am using yarn and PySpark 2.1. Errors I receive:
ERROR TransportRequestHandler: Error sending result RpcResponse
WARN NettyRpcEndpointRef: Error sending message
Future timed out after [10 seconds]
I have read that most people get this error becomes of OOM errors but that is not in my stderr logs anywhere. I have tried changing spark.executor.heartbeatInterval to 30s and that makes the Future timed out warning message less frequent but the results are the same.
I have tried to get better results using different number of partitions varying from 30 to 1000. I have tried increasing my executor memory to 10g even though I don't think that is the problem. I have tried using small datasets of only a few megabytes to larger datasets of 50gb. The only time I can get a lot of executors to work is when I am doing a very simple job like reading in files and writing them somewhere else. In this situation the executors don't have to swap data and so I'm wondering if somehow that is the problem. Every other job where I do any aggregation or collecting or basically anything else I try gives me the same errors, or at least extremely slow execution. I just am hoping there is some other suggestion that I can try.
回答1:
During the time of allocating resources, you have to look for hardware settings of your cluster mainly. The optimum provisioning is quite of a tricky thing.
- Number of Nodes
- VCores
- Memory in each
Depending upon these 3, you have to decide the following
- num-executors
- executor-cores
- executor-memory
Most of the time, setting up --executor-cores to more than 5, gives degraded performance. So, set it to 5.
Set num-executors = [{Number of Nodes * (VCores - 1)} / executor-cores] - 1
The simple rule behind this is set aside one Core for Yarn/Hadoop daemons and one executor for AppMaster.
Set executor-memory = [(M - 1) * {(VCores - 1) / executor-cores}] * (1 - 0.07)
Here, 0.07 is the Off Heap Memory which you have to set aside.
Again, this formulas are nowhere etched in stone so set it according to your use case. These were nothing but some generic rule that I follow.
Hope, this helps.
来源:https://stackoverflow.com/questions/48236440/pyspark-adding-executors-makes-app-slower