Tomek fixed some bugs in Jato and now Clojure runs under the VM:
Wednesday, October 14, 2009
Clojure runs under Jato!
Lähettänyt
Pekka Enberg
klo
8:12 PM
0
kommenttia
Linkit tähän tekstiin
Tunnisteet: jato clojure
Sunday, October 11, 2009
Tetris runs under Jato!
Tomek fixed some bugs and now Tetris runs under Jato:
In related news, Clojure shell starts up under Jato as well (and dies an ugly death if you actually try to us it for something).
Lähettänyt
Pekka Enberg
klo
10:16 PM
0
kommenttia
Linkit tähän tekstiin
Tunnisteet: jato
Wednesday, September 9, 2009
Jato v0.0.1 released
12:04 < vegard> so we pop the champagne now?
For further details, check out the official release announcement. Thanks to everyone involved during the past four years in making this happen! I hope that it doesn't take another four years to get version 1.0 out of the door...
Lähettänyt
Pekka Enberg
klo
4:46 PM
0
kommenttia
Linkit tähän tekstiin
Tunnisteet: jato
Sunday, August 23, 2009
Full JVM instruction set coverage on 32-bit x86
Jato reached 100% coverage of the JVM instruction set on 32-bit x86 today. We have plenty of bugs and missing features that prevent you from running most Java programs out there. However, reaching 100% coverage is an important milestone because you can now try to run your favorite Java applications under Jato and report bugs to us when you see the VM crash and burn!
Lähettänyt
Pekka Enberg
klo
12:21 AM
0
kommenttia
Linkit tähän tekstiin
Tunnisteet: jato
Tuesday, August 11, 2009
Hello, World (the Swing edition)!
The Hello World Swing example makes perfect screenshot material to show off Jato's amazing progress:
Pretty cool, eh?
Lähettänyt
Pekka Enberg
klo
9:39 AM
1 kommenttia
Linkit tähän tekstiin
Friday, August 7, 2009
Optimizing System.out.println() in Jato (...or how to reduce execution time from 10.5 seconds down to 0.3 seconds)
I am a big fan of the "first make it work, then make it better" principle and that's what we've been doing with Jato. However, as we were getting System.out.println() mostly working, we found ourselves in the "then make it better" situation because running a simple "hello, world" program took a whopping 10.5 seconds:
penberg@penberg-laptop:~/testing/jato$ time jato Hello
[snip]
real 0m12.680s
user 0m10.544s
sys 0m1.027s
As it was pretty evident something was seriously broken, I turned to the shiny new performance counters tool that's part of Linux 2.6.31 release candidate kernels:
penberg@penberg-laptop:~/testing/jato$ perf record -f jato -Xperf Hello
[snip]
[ perf record: Captured and wrote 2.665 MB perf.data (~116455 samples) ]
penberg@penberg-laptop:~/testing/jato$ perf report | head -10
# Samples: 116267
#
# Overhead Command Shared Object Symbol
# ........ ....... ....................................... ......
#
37.71% jato /home/penberg/bin/jato [.] analyze_liveness
16.27% jato /home/penberg/bin/jato [.] insn_defs
12.61% jato /home/penberg/bin/jato [.] test_bit
12.21% jato /home/penberg/bin/jato [.] insn_uses
6.30% jato /home/penberg/bin/jato [.] split_interval_at
The profile clearly indicates that we were spending 66.19% of the execution time in liveness analysis (functions
analyze_liveness, insn_defs, and insn_uses). It's also likely that much of the 12.61% time spent in test_bit() was originated from liveness analysis as well!After looking at the trace for few seconds, I went for the the sucker optimization of commit a8b16eb ("lib: Make test_bit() an inline function") which helped but not much:
penberg@penberg-laptop:~/testing/jato$ time jato Hello
[snip]
real 0m12.636s
user 0m9.723s
sys 0m1.050s
Luckily, I heard the voice of an ex-colleague of mine in my head saying "always optimize the _top-most_ functions in a profile report _first_" and turned my attention to the real problem. Reading the code, I was able to determine that we were updating live ranges way too many times in the wrong place. As I tried to fix it, I kept seeing crashes which turned out to be actual bugs in our use-def code!
After Arthur Huillet and Vegard Nossum fixed the use-def bugs, I did the first serious optimization in commit 8aa6c4f ("jit: optimize __analyze_use_def()") bringing the execution time from 9.7 seconds to 6.3 seconds:
penberg@penberg-laptop:~/testing/jato$ time jato Hello
[code]
real 0m7.266s
user 0m6.389s
sys 0m0.159s
While 6.3 seconds is better, it's not amazingly good so I then turned to profile report again:
[snip]
# Samples: 66106
#
# Overhead Command Shared Object Symbol
# ........ ....... ....................................... ......
#
33.76% jato /home/penberg/bin/jato [.] insn_defs
21.72% jato /home/penberg/bin/jato [.] __update_interval_insn
20.91% jato /home/penberg/bin/jato [.] insn_uses
14.51% jato /home/penberg/bin/jato [.] __analyze_use_def
1.42% jato /home/penberg/bin/jato [.] allocate_registers
It didn't took too long for Arthur Huillet to notice that we were calling
for_each_variable() (where the number of iterations can be very big in pathological cases) to find instructions that use or define a temporary. He suggested turning the code the other way around and find all the temporaries of an instruction which I and Vegard Nossum implemented in commit f0438e0 ("jit: Optimize __analyze_use_def()"). The change brought us from 6.3 down to 1.8 seconds:penberg@penberg-laptop:~/testing/jato$ time jato Hello
[time]
real 0m2.228s
user 0m1.861s
sys 0m0.166s
I was feeling pretty happy about the improvement so I decided to see how much we were losing to Hotspot:
penberg@penberg-laptop:~/testing/jato$ java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)
penberg@penberg-laptop:~/testing/jato$ time java Hello
hello, world
real 0m0.268s
user 0m0.040s
sys 0m0.018s
Ouch! It took 0.04 seconds for Hotspot to do what we did in 1.8 seconds so I fired up the profiler again:
# Samples: 20298
#
# Overhead Command Shared Object Symbol
# ........ ....... ....................................... ......
#
69.71% jato /home/penberg/bin/jato [.] __update_interval_insn
4.25% jato /home/penberg/bin/jato [.] allocate_registers
4.19% jato [kernel] [k] get_page_from_freelist
1.68% jato /lib/tls/i686/cmov/libc-2.8.90.so [.] _int_malloc
1.66% jato /home/penberg/bin/jato [.] split_interval_at
Looking at the hotspot in
__update_interval_insn() we saw yet another for_each_variable() loop (I told you it was expensive). Vegard Nossum replaced the loop with a proper data structure in commit5adab71 ("jit: introduce per-cu lir -> insn mapping") and was able to bring us down to 0.3 seconds:
penberg@penberg-laptop:~/testing/jato$ time jato Hello
hello, world
real 0m0.530s
user 0m0.375s
sys 0m0.077s
There's still more room to improve but we've turned our attention to fixing bugs as the level of performance is pretty satisfactory at this point. I did take a look at the profile one last time:
# Samples: 4429
#
# Overhead Command Shared Object Symbol
# ........ ....... ....................................... ......
#
15.69% jato /home/penberg/bin/jato [.] insert_to_list
6.84% jato /lib/tls/i686/cmov/libc-2.8.90.so [.] _int_malloc
6.62% jato [kernel] [k] get_page_from_freelist
6.34% jato /lib/tls/i686/cmov/libc-2.8.90.so [.] __GI_strcmp
3.97% jato /home/penberg/bin/jato [.] zalloc
2.73% jato /lib/tls/i686/cmov/libc-2.8.90.so [.] __GI___libc_malloc
The current bottleneck function
insert_to_list() is another example of using a simple loop instead of a proper data structure, this time in the register allocator. It's also interesting to see how malloc() (_int_malloc) and the kernel page allocator (get_page_from_freelist) total at 10.8% of the execution time. This hardly surprising as Jato is very memory hungry and doesn't have a garbage collector yet.While I'm happy and proud of the speed-up, it's not that impressive as the real challenge is to try to catch up with Hotspot which is pretty damn hard. That said, the whole exercise reinforced two fundamental rules of performance optimizations in me: (1) measure, don't guess and (2) always optimize the parts where your program spends most its time in. Sounds simple and trivial? Yes, but it's very easy to forget that when you're trying to make something go faster.
Lähettänyt
Pekka Enberg
klo
7:41 PM
0
kommenttia
Linkit tähän tekstiin
Tunnisteet: jato, performance
Watch for Falling Rocks...
Do you know the warning signs of a bad programmer?
Lähettänyt
Pekka Enberg
klo
1:34 PM
0
kommenttia
Linkit tähän tekstiin
Tunnisteet: programming