The road back to par: Radical reconstructive surgery planned for Firefox 4.0
It was a superb idea. But not even a whole year after its launch, Mozilla engineers are already noting the design may be obsolete, even outmoded, and maybe in retrospect not even particularly well-executed.
In typically introspective Mozilla fashion, JIT compiler engineer Dave Anderson came out with it last week on his blog for the organization, fittingly entitled "Mystery Bail Theater:" TraceMonkey is very good, except for those times when it's very not.
"TraceMonkey is pretty powerful. It carefully observes loops and converts them to super-fast assembly," Anderson wrote. "That's great and all, but there's a problem: sometimes tracing doesn't work. Loops can throw curveballs that cause tracing to stop. Especially with recursion, or lots of nesting, it can be very difficult to build good traces on complex code."
A mockup of the latest UI changes planned for Firefox 4.0, which now include a relocated Home button and a new Bookmarks button. [Courtesy Mozilla]
The absolute timetable for the production of Firefox 4.0 isn't known. That's not because Mozilla keeps it under wraps; it's because it's not really determined yet. The latter part of the year is the cloudy window we're seeing now. The spotlight feature of 4.0, up to this point, is said to be its completely remodeled front-end, borrowing more minimalist ideas from Google Chrome, Apple Safari, and now Opera. But now that speed is a principal issue in browser users' criteria, the next point-release of Firefox will need to at least make up the performance ground that now separates it from all other alternative (non-IE) browsers in the field.
Setting an absolute date may depend on how long it takes for the Firefox code base to recover, if you will, from an extensive surgical procedure -- quite literally, a graft. Users who've been wondering why Mozilla doesn't just simply adopt WebKit as its browser engine, will be interested to learn it's actually working to embrace a piece of it -- not conceptually, but literally.
On paper, Nanojit's procedure should work faster than WebKit, the open source browser engine used by Safari and adopted by Chrome. It doesn't, for reasons Mozilla's Anderson describes as pertaining to the growing multitude of exceptions of code that simply can't be compiled down.
An example of just one kind of fallback exception that happens frequently was described in a recent blog post by Mozilla contributor David Mandelin: "Tracing works by generating type-specialized native code for program paths. So if a program has 1000 paths in its hottest loop, TraceMonkey would have to generate 1000 paths to run it natively with tracing. But that would use up way too much memory for code, so instead TraceMonkey stops after a certain number of paths and falls back to the interpreter."
By contrast, Anderson described, WebKit has a component called Nitro that compiles not threads that are traced in advance, but big chunks of code -- entire methods. Its process is called inline threading.
In recent Betanews tests using, ironically, code whose corrections were suggested by Opera Software, we got a glimpse of the strengths and weaknesses of both approaches: When optimizing the very same loop a hundred thousand, or a million, times, Firefox's Nanojit methodology executes the resulting code much quicker than any other browser, including the new Opera 10.5 and Chrome 5 dev builds. That's why, in this one test battery of ours, Firefox 3.7 Alpha 2 scores a 4.34 versus the WebKit/Safari daily build's score of 3.91, and all other competitors slower from there.
Repeating the same stuff over and over is one case where TraceMonkey, to use Anderson's analogy, applies rocket boosters. But real-world code doesn't repeat stuff a million times. In those cases, TraceMonkey's boosters never come on. Nitro ends up being more efficient -- it has optimizations for certain types of methods planned well in advance, whereas Nanojit starts from scratch.
The architectural solution, which is being improved upon even as I write this, is being called JaegerMonkey. Using this architecture, developers are transplanting the TraceMonkey component with the inline threading component of Nitro (with permission), which will optimize the JS code the same way WebKit does. Then they will bolt TraceMonkey back onto the Nitro graft, to trace and improve the exceptions that Nitro cannot optimize. As Mandelin reports, Mozilla contributors such as Anderson and Luke Wagner have been making adjustments and improvements to Nitro, to make it fit better in the Mozilla scheme.
"We've barely started and the results are already really promising," reports Anderson. "Running SunSpider on my machine, the whole-method JIT is 30% faster than the interpreter on x86, and 45% faster on x64. This is with barely any optimization work!"
Sadly, the odd man out appears to be Nanojit, one of the components that was responsible for the huge speed gains in Firefox 3.5. The work of Mozilla contributor Nicholas Nethercote, Nanojit was being redeveloped for a new Firefox version as late as last month, with an eye toward optimizing the assembly-like code base that it generates.
As notes from Mozilla planning meetings indicate, they considered using Nethercote's optimized Nanojit, but finally opted against it. "The main problem is that there is not enough control over the results to get the best code. In particular, there are tricks for calling stub functions (functions that implement JS ops that are not inlined) very efficiently that Nanojit doesn't currently support. We think there will be other tricks with manual register allocation and such that are also not currently supported. We don't want to gate this work on Nanojit development or junk Nanojit up with features that will be non-useful for it's current applications. Also, the compilation time is much longer for LIR than for using an assembler."
Next: The road from here to there...