JVM flags -XX:+UseDynamicNumberOfGCThreads -XX:+TraceDynamicGCThreads enabled to see no. of GC threads during GC. Please explain the output logs?

问题

We have an application running Wildfly app server in a clustered mode (6 nodes). We are seeing sometimes JVM freeze of 16secs when there is a GC triggered. The application is time sensitive and other nodes in the cluster thinks that node is dead (in which the JVM pause) if the heartbeat response is not received with in 15secs. So, The JVM freeze causing application instability. To understand what is going on during GC, we enabled hotspot, safepoint logs and see the following traces when there is a GC pause.

Can anybody explain what is meant by the following parameters.

1.) active_workers(): 13  
2.) new_acitve_workers: 13  
3.) prev_active_workers: 13
4.) active_workers_by_JT: 3556  
5.) active_workers_by_heap_size: 146

Environment details: Linux 64bit RHEL 7 OpenJDK 1.8 Heap size: 12GB (Young:4GB, Tenure:8GB) CPU cores: 16 VMware ESX 5.1

JVM Arguments:

-XX:ThreadStackSize=512 
-Xmx12288m 
-XX:+UseParallelGC 
-XX:+UseParallelOldGC 
-XX:MaxPermSize=1024m 
-XX:+DisableExplicitGC 
-XX:NewSize=4096m 
-XX:MaxNewSize=4096m 
-XX:ReservedCodeCacheSize=256m 
-XX:+UseCodeCacheFlushing
-XX:+UseDynamicNumberOfGCThreads

Any suggestions in tuning these JVM parameters to reduce the GC pause time?

GC logs:

GCTaskManager::calc_default_active_workers() : active_workers(): 13  new_acitve_workers: 13  prev_active_workers: 13
 active_workers_by_JT: 3556  active_workers_by_heap_size: 146
GCTaskManager::set_active_gang(): all_workers_active()  1  workers 13  active  13  ParallelGCThreads 13
JT: 1778  workers 13  active  13  idle 0  more 0
2016-10-06T07:38:47.281+0530: 48313.522: [Full GC (Ergonomics) DrainStacksCompactionTask::do_it which = 3 which_stack_index = 3/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 7 which_stack_index = 7/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 2 which_stack_index = 2/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 0 which_stack_index = 0/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 11 which_stack_index = 11/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 6 which_stack_index = 6/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 1 which_stack_index = 1/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 12 which_stack_index = 12/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 4 which_stack_index = 4/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 5 which_stack_index = 5/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 9 which_stack_index = 9/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 8 which_stack_index = 8/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 10 which_stack_index = 10/empty(0) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 3 region_stack = 0x780be610  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 5 region_stack = 0x780be730  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 7 region_stack = 0x780be850  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 11 region_stack = 0x780bea90  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 1 region_stack = 0x780be4f0  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 10 region_stack = 0x780bea00  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 8 region_stack = 0x780be8e0  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 4 region_stack = 0x780be6a0  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 0 region_stack = 0x780be460  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 2 region_stack = 0x780be580  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 6 region_stack = 0x780be7c0  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 12 region_stack = 0x780beb20  empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 9 region_stack = 0x780be970  empty (1) use all workers 1
[PSYoungGen: 63998K->0K(4082176K)] [ParOldGen: 8346270K->3657870K(8388608K)] 8410268K->3657870K(12470784K), [Metaspace: 465864K->465775K(1495040K)], 16.0898939 secs] 
[Times: user=180.57 sys=2.46, real=16.09 secs]
2016-10-06T07:39:03.373+0530: 48329.615: Total time for which application threads were stopped: 16.2510644 seconds, Stopping threads took: 0.0036805 seconds

Safepoint Logs:

48313.363: ParallelGCFailedAllocation       [    2384          0              2    ]      [     0     0     3    35 16210    ]  0

Thanks in advance for your help.

回答1:

Judging by the ParallelGCFailedAllocation and [PSYoungGen: 63998K->0K(4082176K)] [ParOldGen: 8346270K->3657870K(8388608K)] 8410268K->3657870K(12470784K), [Metaspace: 465864K->465775K(1495040K)], 16.0898939 secs] we have the following conditions:

YoungGen is almost empty (only 63М occupied out of 4G)
OldGen is almost full (only 42М left out of 8,3G)
JVM tried to move survived objects from YoungGen or failed allocate them in Survivor space and decided to move them to OldGen
OldGen had insufficient space as well (only 42M as mentioned) so a FullGC was triggered
After a FullGC 5G of OldGen is collected (8346270K->3657870K)

Even 13 GC threads running in parallel are collecting those 5G for 16 seconds. Since you have only 16 cores there not so many room for speed improvement from adding more threads.

The following might be happening here:

Your objects live too long for a YounGen, therefore you would need to switch to CMS/G1 so it will collect OldGen more frequently and it will take less time in total. You would need to tune InitiatingHeapOccupancyPercent as per your needs. Jugding by current output you should initiate somewhere around 4G. Though it will put in question if you really need those 12G of heap, since it would be a subject to heap fragmentation issues.
Your objects are short-lived but too big to be accomodated in Survivor space, therefore you would need to tune SurvivorRatio parameter to make it bigger. Something like SurvivorRatio=4 (it will make it 1G in that case).

So it really depends on your object allocation pattern. The best approach would be to try it somewhere else before applying it in production.

JVM GC parameters could be found here

来源：https://stackoverflow.com/questions/39891275/jvm-flags-xxusedynamicnumberofgcthreads-xxtracedynamicgcthreads-enabled-to

标签

java

performance

jvm

jvm-hotspot