JVM Freeze under high load in longevity tests

跟風遠走 提交于 2021-01-27 04:44:22

问题


Running with JVM:

java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

OS:

CentOS release 6.4 (Final)

Jvm Options:

-Xmx4g -Xms4g -XX:MaxPermSize=4g -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintClassHistogram -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+DisableExplicitGC

Running in an OSGI environment, Aerospike DB, NETTY (NIO) for networking.

Ran a weekend longevity test. This was the last print:

[2015-12-11 09:54:51,185] INFO  : [GC pause (young)

After 2 days I ran strace on the pid, and then those are the next prints:

[2015-12-11 09:54:51,185] INFO  : [GC pause (young) 3598M->1458M(4096M), 0.0280020 secs]
[2015-12-13 11:54:54,353] INFO  : [GC pause (young) 3598M->1464M(4096M), 180001.5628870 secs]

The first print finished and the next print showed a 2 days GC.

The jvm did not respone to thread dump signals during the freeze (pkill -QUIT pid). This freeze happens every few days. The freeze happens not only with the G1 collector, but also with CMS collector. How can I start debugging this, and what can potentially cause this?

Thank you.

EDIT: Had another freeze, this time the strace does not release the freeze. The second freeze was released using jstack.

UPDATE: Found the problem! Look at the answer below.


回答1:


I found the problem!
It is a kernel bug in futex_wait() that was backported to our kernel version.
You can read about it here:
https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64



来源:https://stackoverflow.com/questions/34251580/jvm-freeze-under-high-load-in-longevity-tests

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!