A blog about programming topics, mostly JVM, Linux kernel, and x86 related things.

Monday, July 21, 2008

Jato status update #1

Lots of things have happened in Jato this summer.

Saeed has implemented support for the anewarray, arraylength, and multianewarray bytecode instructions. He's now working on iastore and instanceof instructions but the former is put on back burner because of register allocation limitations. Arthur has been working on 64-bit support and lot patches from him have been merged. Unfortunately it turns out it's really hard to do any of the actual arithmetic operations until we have the infrastructure working. In fact, even Saeed hit the similar problem with his work on iastore.

To make the long story short, for historical reasons, Jato uses the EAX register as a sort of scratch register in the instruction selector by calling get_fixed_var(). The get_fixed_var() function is really meant only for cases where the x86 instruction set requires a fixed register to be used as an operand. The code that uses the function unnecessarily was written when Jato had no register allocator and that was the only API we could use. In new code, you're supposed to be using get_var() which allocates a new temporary (or virtual register) that gets a machine register assigned to it after the instruction selection. Arthur has been working on removing the unnecessary uses of get_fixed_var() and in the process, covered some latent bugs in the code.

With the removal of unnecessary uses of get_fixed_var, the lack of support for spilling and handling of caller and callee saved registers in the register allocation becomes much more painful. So Arthur is now working on fixing up the register allocator so we can go ahead and remove the uses of get_fixed_var(). Hopefully we can get that sorted out soon.

binalloc updates

I re-implemented the freelist handling in binalloc with red-black trees as oprofile (quite unsurprisingly) indicated that the linear scan in kfree() that kept the freelist sorted by size was the bottleneck. After that, the performance problems I observed with binalloc on my machine went away. So I suspect the freelist simply got bigger and bigger after a while making the linear scan even worse.

After running some memory efficiency tests, it turned out that binalloc performed very badly compared to SLOB and SLUB. It was using roughly 64 MBs of total of 128 MB after booting whereas both SLOB and SLUB were using somewhere around 5 MB. Luckily, Johannes Weiner spotted a bug in my red-black tree traversal code that caused binalloc to never re-use the free'd chunks. After he fixed the bug and the memory efficiency of binalloc is in the same ball park as the other allocators.

As binalloc is not quite as efficient as SLUB or SLOB, I'm now working on shrinking the size of boundary tags. I have a patch that shrinks it to eight bytes but I'm still debugging a crash that's likely due to some latent bug in the code. Lets see where we're at after that.

Friday, July 4, 2008

binalloc boots on real hardware!

With little help from Vegard Nossum, I was finally able to nail down two remaining bugs in binalloc, a general purpose best-fit kernel memory allocator, that prevented me from booting it on real hardware. Unfortunately, compiling a kernel was dead slow and the machine seemed to get slower and slower as time passed. Shutting down the laptop took 5-10 minutes!

So some bugs are still lurking there but at least I've made some forward progress. Yay!