问题
I am executing a Spark job in Databricks cluster. I am triggering the job via a Azure Data Factory pipeline and it execute at 15 minute interval so after the successful execution of three or four times
it is getting failed and throwing with the exception "java.lang.OutOfMemoryError: GC overhead limit exceeded"
.
Though there are many answer with for the above said question but in most of the cases their jobs are not running but in my cases it is getting failed after successful execution of some previous jobs.
My data size is less than 20 MB only.
My cluster configuration is:
So the my question is what changes I should make in the server configuration. If the issue is coming from my code then why it is getting succeeded most of the time. Please advise and suggest me the solution.
回答1:
This is most probably related to executor memory being bit low .Not sure what is current setting and if its default what is the default value in this particular databrics distribution. Even though it passes but there would lot of GCs happening because of low memory hence it would keep failing once in a while . Under spark configuration please provide spark.executor.memory and also some other params related to num of executors and cores per executor . In spark-submit the config would be provided as spark-submit --conf spark.executor.memory=1g
来源:https://stackoverflow.com/questions/58640218/databricks-spark-java-lang-outofmemoryerror-gc-overhead-limit-exceeded-i