A blog about programming topics, mostly JVM, Linux kernel, and x86 related things.

Saturday, January 2, 2010

libcpu x86 front-end

I stumbled across the libcpu project few days ago. It's a "anything to anything" binary translation library built on top of LLVM. This means that LLVM does the actual JIT code generation and the libcpu parts handle instruction decoding for various architectures. Needless to say, I thought the concept was pretty damn cool and started writing a x86 front-end for fun.

Decoding x86 instructions is not as hard as it might seem at first. There's plenty of examples around: Vegard Nossum's kmemcheck code in Linux kernel has a simple decoder and there's also a more complete decoder in KVM.

I am targeting for 8086 instruction set first because it's pretty small and well-contained. Modern x86 instruction set is huge because it includes things like MMX and SSE. Luckily, there's quite a lot of overlap between 8086, i386, and x86-64 instruction decoding so hopefully we can reuse much of the 8086 code for full x86 support.

I have managed to write decoding and disassembly routines for only few x86 instructions so far so if you're interested in x86 binary translation, just grab the Intel manuals and libcpu sources and start hacking! Instruction format is described in Chapter 2 ("Instruction Format") of Volume 2A ("Instruction Set Reference, A-M"). The x86 specific code is in arch/x86 directory of libcpu.

I am currently using the following assembly snippet as a test case:

 .code16gcc
.text
.globl _start
.type _start, @function
_start:
movw $1, %ax
movw $2, %bx
movw $3, %cx
movw $4, %dx

movw %ax, %bx
movw %bx, %cx
movw %cx, %dx
movw %dx, %ax

retw


You can use the following makefile to compile the above snippet into a flat binary:

NAME := i8086

BIN := $(NAME).bin
ELF := $(NAME).elf
OBJ := $(NAME).o

all: $(BIN)

$(BIN): $(ELF)
objcopy -O binary $< $@

$(ELF): $(OBJ)
ld -nostdlib -static $< -o $@

%.o: %.S
gcc -nostdinc -c $< -o $@

clean:
rm -f $(BIN) $(ELF) $(OBJ)
.PHONY: clean


If you copy the resulting i8086.bin into test/bin/x86/ directory of libcpu, you can run the generated code with the following commands in libpcu root:

make
./test/scripts/8086.sh


The makefile also generates a i8086.elf object file that you can use with objdump to verify that the generated binary contains sane code. Just remember to use the "-m i8086" command line option with objdump to let it know we're dealing with 16-bit code:


$ objdump -d -m i8086 i8086.elf

i8086.elf: file format elf32-i386


Disassembly of section .text:

08048054 <_start>:
8048054: b8 01 00 mov $0x1,%ax
8048057: bb 02 00 mov $0x2,%bx
804805a: b9 03 00 mov $0x3,%cx
804805d: ba 04 00 mov $0x4,%dx
8048060: 89 c3 mov %ax,%bx
8048062: 89 d9 mov %bx,%cx
8048064: 89 ca mov %cx,%dx
8048066: 89 d0 mov %dx,%ax
8048068: c3 ret


Happy hacking!

3 comments:

Chris Lattner said...

FYI, mainline LLVM already has an X86 disassembler, take a look at llvm/test/MC/Disassembler/simple-tests.txt for that drives a simple command line test harness of it.

Pekka Enberg said...

OK, I'll take a look at it. Thanks, Chris!

Stephen Leary said...

Hey guys... libcpu.org seems to be down?

Project dead?