spark - application returns different results based on different executor memory?

大憨熊 提交于 2019-12-23 04:04:27

问题


I am noticing some peculiar behaviour, i have spark job which reads the data and does some grouping ordering and join and creates an output file.

The issue is when I run the same job on yarn with memory more than what the environment has eg the cluster has 50 GB and i submit spark-submit with close to 60 GB executor and 4gb driver memory. My results gets decreased seems like one of the data partitions or tasks are lost while processing.

driver-memory 4g --executor-memory 4g --num-executors 12

I also notice the warning message on driver -

WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf. 

but when i run with limited executors and memory example 15GB, it works and i get exact rows/data. no warning message.

driver-memory 2g --executor-memory 2g --num-executors 4

any suggestions are we missing some settings on cluster or anything? Please note my job completes successfully in both the cases. I am using spark version 2.2.


回答1:


This is meaningless (except maybe for debugging) - the plan is larger when there are more executors involved and the warning is that it is too big to be converted into a string. if you need it you can set spark.debug.maxToStringFields to a larger number (as suggested in the warning message)



来源:https://stackoverflow.com/questions/51791008/spark-application-returns-different-results-based-on-different-executor-memory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!