A blog about programming topics, mostly JVM, Linux kernel, and x86 related things.

Monday, January 26, 2009

Reverse Loop

This "micro benchmark" really made my day!

Yes, I know, it violates almost every rule of a good micro benchmark, but I actually ran it and the fun part is that HotSpot doesn't even compile the main() method that has the loop that is being "performance tested"! And as the program has tons of GC activity going on so any results are lost in the noise anyway.

That said, I'm not sure which is funnier, the failed attempt at benchmark or the conclusion that "reverse loop is a good programming practice" because "it can increase the performance." In any case, I'm looking forward to a follow up article where they benchmark instanceof! Hopefully we can get the age old instanceof vs virtual methods debate resolved once and for all.

Non-Empty Operand Stack at Entry and Exit of Basic Blocks in the Java Virtual Machine

The JVM instruction set is stack based which means that instructions operate on a special operand stack. This means that some of instructions can push values to the operand stack whereas others pop values from it, operate on the data, and push results back on the stack.

When a JIT compiler translates bytecode into native code, the stack based instructions are converted to register based instructions with the use of something called the mimic stack that emulates the operand stack at compile time. The way mimic stack works is described in Section 2.2 ("Lazy code selection") of the paper Fast, effective code generation in a just-in-time Java compiler (1998) by Adl-Tabatabai et al.

There's one thing that complicates the process: the operand stack can be non-empty at the entry and exit of a basic block. One example where this happens is when bytecode is generated for the ternary operator (i.e. the a ? b : c expression). For example, for the following piece of code:

public class Ternary {
public static int ternary(boolean a, int b, int c) {
return a ? b : c;

the generated bytecode looks like this:

public static int ternary(boolean, int, int);
# B1:
0: iload_0
1: ifeq 8
# B2:
4: iload_1
5: goto 9
# B3:
8: iload_2
# B4:
9: ireturn

Looking at it, the code above has four basic blocks: the first one covers offsets 0-4 ("B1"), the second one covers 4-8 ("B2"), the third one 8-9 ("B3"), and the final one covers 9-10 ("B4"). As the method is marked as static, the boolean a is local variable 0, b is local variable 1, and c is local variable 2.

Now, the iload_0 and ifeq 8 instruction pair branches to B3 if a == false; otherwise execution continues in B2. In B3, the iload_2 instruction pushes local variable c on the operand stack and we then fall through to B4 where ireturn pops one value from the operand stack and uses that as the return value. On the other hand, if a == true, we continue execution in B2 where the iload_1 instruction pushes local variable b on the operand stack and jumps unconditionally to B4 where ireturn uses that as a return value.

We can now see that the operand stack is empty at entry and exit in B1 but in B2 and B3, it's non-empty at exit whereas in B4, the basic block is non-empty at entry. The JIT compiler needs to spill any non-empty mimic stack to either memory or registers at the exit of a basic block (the paper mentions that they spill to memory only) and pre-populate the mimic stack for basic blocks that are non-empty at entry based on the spilled values to handle the boundary conditions.