A blog about programming topics, mostly JVM, Linux kernel, and x86 related things.

Monday, July 27, 2009

Garbage Collection and JIT

As things are improving nicely on the JIT side of things in Jato, I've been reading up on garbage collection lately. Now, while the actual GC design itself is obviously very important, the first step in implementing GC for Jato is to figure out how the GC discovers the root set of references.

Most GCs seem to use safe-points to suspend all running threads before doing the actual collection as explained in the GC Points in a Threaded Environment (1998) paper. We have much the infrastructure to implement safe-points in place because we already do various things through hardware traps (exception handling, class initialization) which leaves us with the problem of how to discover the actual references once all threads have been rolled forward to their respective safe points.

The tricky part in JIT'd code is that some root set references can be on the native stack or even in hardware registers. While the problem might seem hard to solve at first, there's a pretty natural solution to it called "GC maps" (also known as stack maps and register maps). The Compiler Support for Garbage Collection in a Statically Typed Language (1992) paper has a pretty good explanation of how to implement GC maps in an efficient way.

The basic idea with GC maps is that the JIT compiler generates a per-safepoint map of all the possible locations which can contain object references. Once we reach a safepoint, the GC will use the map to look up references that are either on the native stack or in hardware registers. We also need to walk the stack to look up references at all call-sites up to the top-most caller of each thread. This is pretty straight-forward as all call-sites are usually also safepoints and thus already have the GC maps in place.

When we combine the object references looked up via GC maps with class variables (i.e. statics) and local variables, we have the full root set and are able to do garbage collection in multi-threaded environment that executes JIT'd code. There are other important design decisions (e.g. stop the world vs. concurrent) which affect GC performance a lot but safepoints and GC maps are compatible with most of the different choices we have.

No comments: