A blog about programming topics, mostly JVM, Linux kernel, and x86 related things.

Monday, July 27, 2009

Garbage Collection and JIT

As things are improving nicely on the JIT side of things in Jato, I've been reading up on garbage collection lately. Now, while the actual GC design itself is obviously very important, the first step in implementing GC for Jato is to figure out how the GC discovers the root set of references.

Most GCs seem to use safe-points to suspend all running threads before doing the actual collection as explained in the GC Points in a Threaded Environment (1998) paper. We have much the infrastructure to implement safe-points in place because we already do various things through hardware traps (exception handling, class initialization) which leaves us with the problem of how to discover the actual references once all threads have been rolled forward to their respective safe points.

The tricky part in JIT'd code is that some root set references can be on the native stack or even in hardware registers. While the problem might seem hard to solve at first, there's a pretty natural solution to it called "GC maps" (also known as stack maps and register maps). The Compiler Support for Garbage Collection in a Statically Typed Language (1992) paper has a pretty good explanation of how to implement GC maps in an efficient way.

The basic idea with GC maps is that the JIT compiler generates a per-safepoint map of all the possible locations which can contain object references. Once we reach a safepoint, the GC will use the map to look up references that are either on the native stack or in hardware registers. We also need to walk the stack to look up references at all call-sites up to the top-most caller of each thread. This is pretty straight-forward as all call-sites are usually also safepoints and thus already have the GC maps in place.

When we combine the object references looked up via GC maps with class variables (i.e. statics) and local variables, we have the full root set and are able to do garbage collection in multi-threaded environment that executes JIT'd code. There are other important design decisions (e.g. stop the world vs. concurrent) which affect GC performance a lot but safepoints and GC maps are compatible with most of the different choices we have.

Jato bytecode coverage at 80% on i386

We support 80% of the JVM instruction set on i386 as of yesterday. That's a nice improvement from the ~60% coverage we had before Google Summer of Code 2009 kicked in full gear. What we're missing still is support for the double data type, 64-bit arrays, and switch statements. On x86-64, things are slowly getting better and we're already able to run some of the regression tests. Hopefully Eduard - Gabriel Munteanu will be able to reuse much of the 32-bit code on 64-bit to get better coverage soon.

But it's not as if we're going to run out of interesting things to hack on after we hit the 100% mark on bytecode coverage. Arthur Huillet is working on register allocator bugs that prevent us from executing System.out.println() successfully and we haven't really started to work on GC support yet which means you won't be able to run any real applications any time soon. We don't really support java.lang.Thread yet either which is likely to expose some interesting bugs in the VM. There's also some more core VM, JNI, and related work to do although Tomek Grabiec and Vegard Nossum have been making tons of progress in that area.

So if you're an aspiring VM/JIT/GC hacker, drop by at the #jato channel on irc.freenode.net and join the fun!

Wednesday, July 22, 2009

Performance of alternative languages on the JVM

Cliff Click's JavaOne "Alternative languages on the JVM" presentation slides have a nice summary of how well alternative languages such as Scala, Clojure, and JRuby compile down to "bare metal" on the JVM. The slides are a good read for anyone wanting to use any of the alternative languages in performance critical environments.

Needless to say, Scala performs the best because it lets you follow the Java semantics although if you write code Scala in the native "elegant" style, it's almost as bad as everyone else.

Tuesday, July 7, 2009

Stack traces in Jato

Tomek finished support for stack traces:

  penberg@penberg-laptop:~/src/jato$ cat Exception.java 
public class Exception {
public static void foo() {
throw new RuntimeException("error");
}

public static void main(String[] args) {
foo();
}
}
penberg@penberg-laptop:~/src/jato$ javac Exception.java
penberg@penberg-laptop:~/src/jato$ ./scripts/java Exception

[snip]

Exception in thread "main" java/lang/RuntimeException: error
at Exception.foo(Exception.java:3)
at Exception.main(Exception.java:7)


Yay!

Sunday, July 5, 2009

Efficient x87 code generation

Jato uses the SSE1 instruction set for floating point arithmetic but we obviously need to support the older stack-based x87 instruction set if we want to be able to run JIT'd code on older CPUs. There are some papers out there that suggest mapping the stack-based bytecode instructions to x87 but implementing something like that is probably going to be painful because all other bytecodes are converted to register based representation in the front-end.

Luckily, it turns out that the recommended programming model for the x87 instruction is much simpler than that. Quoting the paper "Some notes on the new MLRISC x86 floating point code generator (draft)":

[...] the preferred code generation methodology advocated by the Intel optimization guide [Cor97] is to treat the floating point stack as a bank of 8 registers, and use the fxch instruction to to swap contents of the stack as needed. The instruction fxch is implemented as register renaming internally by the processor and incurs no cost if paired with most floating point instructions.


The paper goes on to explain how you are basically able to "by-pass" the stack model with the fxch instruction and pretend that you have 8 normal FPU registers for register allocation. The approach requires an instruction rewriting phase before code emission but overall, it's a much better fit for Jato than having to deal with the x87 stack.

Saturday, July 4, 2009

invokeinterface support in Jato

Vegard Nossum completed support for invokeinterface in just two days! The implementation uses interface method tables (IMT) which is similar to virtual method tables that are used by invokevirtual except that it is able to handle "multiple inheritance" if a class implements multiple interfaces.

Rubinius

If you're interested in virtual machine hacking, check out Rubinius "book tour" for an excellent list of books and other resources on virtual machine implementation details. Highly recommended reading!

Friday, July 3, 2009

Floating point support in Jato

Arthur Huillet has been working on floating point support for Jato. Today, I merged some initial FPU support patches which brings our bytecode coverage to 70%! There's still some work to do (we don't support doubles, for example) but I'm pretty confident we'll have complete FPU support in Jato soon.

We're using SSE1 instructions for the FPU support because the stack-based x87 model is just really weird and inefficient. We wanted to use SSE2 but as it turns out, 32-bit AMD CPUs only support SSE1. But I guess that's a good thing as we now support old Intel CPUs all the way back to Pentium III.