I have recently read bits and pieces about garbage collection (mostly in Java) and one question still remains unanswered: how does a JVM (or runtime system in general) keeps
The HotSpot VM generates a GC map for each subroutine compiled which contain information about where the roots are. For example, suppose it has compiled a subroutine to machine code (the principle is the same for byte code) which is 120 bytes long, then the GC map for it could look something like this:
0 : [RAX, RBX]
4 : [RAX, [RSP+0]]
10 : [RBX, RSI, [RSP+0]]
...
120 : [[RSP+0],[RSP+8]]
Here [RSP+x]
is supposed to indicate stack locations and R??
registers. So if the thread is stopped at the assembly instruction at offset 10 and a gc cycle runs then HotSpot knows that the three roots are in RBX
, RSI
and [RSP+0]
. It traces those roots and updates the pointers if it has to move the objects.
The format I've described for the GC map is just for demonstrating the principle and obviously not the one HotSpot actually uses. It is not complete because it doesn't contain information about registers and stack slots which contain primitive live values and it is not space efficient to use a list for every instruction offset. There are many ways in which you can pack the information in a much more efficient way.