A blog about programming topics, mostly JVM, Linux kernel, and x86 related things.

Saturday, January 9, 2010

Java Virtual Machine developer documentation

So you want to be a Java virtual machine hacker but don't know where to start? One obvious starting point is to navigate your way through the large body of developer documentation.

For a JVM developer, the most important document is the The Java Virtual Machine Specification and its companion document Clarifications and Amendments. The specification describes the ABI of the JVM but doesn't mandate how it is implemented. While you should read through the whole thing, the most interesting chapters to a JVM hacker are Chapter 4 ("The class File Format") and Chapter 6 ("The Java Virtual Machine Instruction Set"). It's also important to understand the JSR-133 ("Java memory model") which describes how threads in the JVM interact through memory. Doug Lea's The JSR-133 Cookbook for Compiler Writers has a pretty nice summary of the specification from a JVM hacker point of view.

There are two additional things that the JVM needs to run serious programs: JNI support and runtime classes. The Java Native Interface Specification describes the iteroperability interface to native code from the JVM. The Sun HotSpot comes with its own runtime classes but many open source JVMs use the GNU Classpath to provide essential classes for the JVM. The GNU Classpath Virtual Machine Integration Guide describes the API that a VM needs to implement to integrate with GNU Classpath.

While the above documentation describe what is needed for a JVM, it's also important to understand how real world JVMs are implemented to see the big picture. Design of the Java HotSpot Client Compiler for Java 6 and The Jalapeño Dynamic Optimizing Compiler for Java provide nice overview of how the HotSpot and Jikes RVM are implemented, respectively. There's a more extensive list of papers on the Jato home page but there's no substitute to reading JVM code if you really want to understand the implementation.

Jato v0.0.2 released

I just released version 0.0.2 of Jato. See the release announcement for details.

Sunday, January 3, 2010

The Wonders of Proprietary Development Tools

Apparently the highly optimizing Intel compilers have a "cripple AMD" functionality built right into them. To make a long story short, Intel compilers generate multiple versions of a piece of code and select the "appropriate" version during run-time based on your CPU vendor. Unsurprisingly enough, the versions used on AMD and VIA CPUs don't use the full potential of the chips (apparently no SSE, for example) and thus perform much worse than the version used on Intel CPUs.

Now, I don't know whether this is intentional or not but it sure is a nice reminder of why Open Source development tools are so important. If such a fault would be found in, say, GCC, you'd better believe an AMD engineer would be submitting a patch to fix it as soon as possible. But with proprietary tools, you're at the mercy of your vendor and unfortunately what's best for your vendor is not always what's best for you. Intel's core business is to sell as many chips as possible so you can be damn sure they don't want to voluntarily spend money on making their proprietary compiler work well on AMD CPUs.

That's why it's so important that we have Open Source and Free Software infrastructure like the Linux kernel and GCC. As a developer, you can be reasonably sure no one is pulling dirty tricks on you and that all vendors are treated more or less equally based on their merits alone. And if you do find a fault that's not getting fixed, you have the luxury of fixing the damn problem yourself.

Saturday, January 2, 2010

libcpu x86 front-end

I stumbled across the libcpu project few days ago. It's a "anything to anything" binary translation library built on top of LLVM. This means that LLVM does the actual JIT code generation and the libcpu parts handle instruction decoding for various architectures. Needless to say, I thought the concept was pretty damn cool and started writing a x86 front-end for fun.

Decoding x86 instructions is not as hard as it might seem at first. There's plenty of examples around: Vegard Nossum's kmemcheck code in Linux kernel has a simple decoder and there's also a more complete decoder in KVM.

I am targeting for 8086 instruction set first because it's pretty small and well-contained. Modern x86 instruction set is huge because it includes things like MMX and SSE. Luckily, there's quite a lot of overlap between 8086, i386, and x86-64 instruction decoding so hopefully we can reuse much of the 8086 code for full x86 support.

I have managed to write decoding and disassembly routines for only few x86 instructions so far so if you're interested in x86 binary translation, just grab the Intel manuals and libcpu sources and start hacking! Instruction format is described in Chapter 2 ("Instruction Format") of Volume 2A ("Instruction Set Reference, A-M"). The x86 specific code is in arch/x86 directory of libcpu.

I am currently using the following assembly snippet as a test case:

 .code16gcc
.text
.globl _start
.type _start, @function
_start:
movw $1, %ax
movw $2, %bx
movw $3, %cx
movw $4, %dx

movw %ax, %bx
movw %bx, %cx
movw %cx, %dx
movw %dx, %ax

retw


You can use the following makefile to compile the above snippet into a flat binary:

NAME := i8086

BIN := $(NAME).bin
ELF := $(NAME).elf
OBJ := $(NAME).o

all: $(BIN)

$(BIN): $(ELF)
objcopy -O binary $< $@

$(ELF): $(OBJ)
ld -nostdlib -static $< -o $@

%.o: %.S
gcc -nostdinc -c $< -o $@

clean:
rm -f $(BIN) $(ELF) $(OBJ)
.PHONY: clean


If you copy the resulting i8086.bin into test/bin/x86/ directory of libcpu, you can run the generated code with the following commands in libpcu root:

make
./test/scripts/8086.sh


The makefile also generates a i8086.elf object file that you can use with objdump to verify that the generated binary contains sane code. Just remember to use the "-m i8086" command line option with objdump to let it know we're dealing with 16-bit code:


$ objdump -d -m i8086 i8086.elf

i8086.elf: file format elf32-i386


Disassembly of section .text:

08048054 <_start>:
8048054: b8 01 00 mov $0x1,%ax
8048057: bb 02 00 mov $0x2,%bx
804805a: b9 03 00 mov $0x3,%cx
804805d: ba 04 00 mov $0x4,%dx
8048060: 89 c3 mov %ax,%bx
8048062: 89 d9 mov %bx,%cx
8048064: 89 ca mov %cx,%dx
8048066: 89 d0 mov %dx,%ax
8048068: c3 ret


Happy hacking!