A blog about programming topics, mostly JVM, Linux kernel, and x86 related things.

Thursday, April 24, 2008

Specification for A Simple Compiler Written in Haskell

I've finished reading Programming in Haskell now so I thought I'd continue planning my C compiler written in Haskell to see what I need to learn next.

I want the first version of the compiler to understand tiny but useful subset of the C programming language and what could be more fun than the "hello world" program! I've simplified the canonical example by stripping #include directives that require a pre-processor and changing printf() to puts() to avoid hassle with variable arguments:

extern int puts(const char *s);

int main(int argc, char *argv[])
puts("hello world");
return 0;

To make the compiler as simple to debug as possible, the it should generate assembly as output. Looking at (slightly cleaned up) output of gcc -save-temps on x86-64, we need to generate the following for the above "hello world" program:

 .file "hello.c"
.section .rodata
.string"hello world"
.globl main
.type main, @function
# Function prologue
pushq %rbp
movq %rsp, %rbp

# puts("hello, world");
movl $.LC0, %edi
call puts

# return 0;
movl $0, %eax

# Function epilogue

Finally, to generate a runnable executable, we simply do the following (with little help from gcc):

cpp hello.c | ./compile > hello.S ; gcc -o hello hello.S

Now looking at this specification, it's obvious that I need to write a lexer and a parser. I also need to design an intermediate representation that the front-end converts source files to before letting the back-end generate assembly code for x86-64. The back-end is straight-forward to implement so I guess I need to start learning Parsec for real now.

Tuesday, April 22, 2008

Summer of Code Is On

The projects in summer of code have now officially started! Arthur Huillet is going to work on 64-bit integer support on i386 (the long data type) and Stanislav Muhametsin will be tacking switch statements (among other things). Congratulations!

I would also like to thank Saeed M. Abdullah for adding support for the anewarray bytecode array even though his project didn't receive a slot this year. Thank you Saeed!

On a related note, I'm also mentoring Eduard-Gabriel Munteanu for his work on kmemtrace, a kernel memory profiler. His project is under the umbrella of Linux Foundation.

Hopefully we'll be hearing more of these projects soon!

Wednesday, April 16, 2008

HotSpot Internals

John Rose has announced a wiki for HotSpot Internals that has an interesting overview of HotSpot optimizations and some recommendations how to write micro benchmarks.