ResourceManager Memory Leak?

问题

We got two CDH cluster with the same version(CDH-5.5.2-1.cdh5.5.2.p0.4), and both the ResourceManager of each cluster with the same configuration.

One of the ResourceManager is running well, and its heap memory is stay in a constant value(e.g 800mb) as the time is going on.

But the other one will throw OOM exception and exit after 15 days. When we use 'jmap -F -histo' to dump its jvm heap info, we are seeing that the size of object 'char[]' is growing up as the time is moving, and it finally throw OOM.

Following is key info of jvm dump result of both the good RM and OOM RM:

dump cmd：jmap -F -histo pid

A）jvm dump of good RM in cluster A [we are seeing that 40w+ char[] instances with 60m+ heap mem][1]

B）jvm dump of bak RM（OOM） in cluster B [we are seeing that 30w+ char[] instances but with 400m+ heap mem][2]

Any help wil be appreciated.

We dump(jmap -F -dump:file=file.dump_result pid) heap info today, and use MAT(memory analyzer tools) to analyse the dump file, we found that the instance variable applications(java.util.concurrent.ConcurrentHashMap) in org.apache.hadoop.yarn.server.resourcemanager.RMActiveServiceContext eats up a lot of memory:

call hierachry information

instance variable: applications

来源：https://stackoverflow.com/questions/40861974/resourcemanager-memory-leak

标签

memory-leaks

resourcemanager

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!