Is GC smart enough to remove objects that are referenced but no longer used?

问题

Let's say I have an object called "master" which owns 100 objects called "slave0" through "slave99" respectively. (This is not an array. I have 100 fields in my "master" class called salve0 to slave99 respectively.) Now, let's say my program first reads in a file which contained the serialized stored version of a "master" object.But, let's say my program never uses objects slave50 through slave99. What will happen? (My guess is that the java program will first read all 100 slave objects as part of the reading/deserialization process, and only after reading all 100 slave objects in, it might choose to do a GC, at which point objects slave50 through slave99 will get removed by the GC and the memory reclaimed. Is this right? NOTE: Object "master" is still being used, so technically, objects slave50 through slave99 are still being referenced by the parent object, master, and the parent object master is still being actively used.)

Follow-up question

So let's say my guess above is correct regarding how the GC works; what then happens if my long-running program spends say a few minutes processing objects slave0 through slave50, but then gets into another final (long-running) procedure named "X" that ONLY processes objects slave0 through slave25. Would the GC then realize that even though the objects slave25 through slave50 are still being reference by parent object master, and even though object master is still being used, the GC will still be smart enough to get rid of objects slave25 through slave50 since no one is going to ever use it from "procedure X" onwards?

回答1:

There is no simple answer to this. You say “Object ‘master’ is still being used”, but not, in which way. In principle, reading and writing fields of an object and even invoking methods on it can get optimized to not requiring the memory of an object.

Or, as the specification puts it:

Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.

Another example of this occurs if the values in an object's fields are stored in registers. The program may then access the registers instead of the object, and never access the object again. This would imply that the object is garbage.

As discussed in “finalize() called on strongly reachable object in Java 8”, this is more than a theoretical issue.

But the specification also says:

Note that this sort of optimization is only allowed if references are on the stack, not stored in the heap.

… (within the example) the inner class object should be reachable for as long as the outer class object is reachable.

Which implies that as long as your “master” object has references to “slave50” through “slave99”, they must be considered reachable as long as the “master” object is considered reachable, but in principle, it is allowed to collect them all together. But according to the same rules, even the still in use “slave0” through “slave25” could get collected then, if the optimized code is capable of running without accessing their memory again.

Note that since optimized code is intended to behave just like the original code, your program won’t notice the difference.

So, there are capabilities to detect unused objects, even if they “would naively be considered reachable”, but usually they depend on the optimization state of the method’s code, as the garbage collector does not analyze code, but the JVM’s optimizer does. In that regard, local variable scope is purely a compile-time thing. It may happen for unoptimized code, that the garbage collector sometimes considers a reference to be still existing, while the local variable is out of scope from the source code’s perspective. But more than often, it happens the other way round, unused local variables disappear in optimized code, even when being in scope from the source code’s perspective. In either case, returning from a method destroys the entire stack frame, including all local variables, thus you never need to set local variables to null before returning.

The best strategy is to never insert any explicit action to “help the garbage collector”, unless you encounter an actual problem with a scenario, the JVM can’t handle sufficiently. These are really rare.

回答2:

In Java a GC won't remove a live object. When looking at a tracing GC logic, an object is considered live when it's reachable from an active thread (unless we are considering more exotic reference types e.g. WeakReference). In your simplistic example all fields in master object are reachable, since the master object itself is reachable from the main thread.

There are various articles you can read on tracing GC:

Tracing garbage collection
Concurrent Mark and Sweep algorithm details
What's the difference between SoftReference and WeakReference in Java?

来源：https://stackoverflow.com/questions/51636212/is-gc-smart-enough-to-remove-objects-that-are-referenced-but-no-longer-used

标签

java

garbage-collection