The Betanews Comprehensive Relative Performance Index: How it works and why

By Scott M. Fulton, III
Published 16 years ago

The additions and changes we've made to our performance index

The Mozilla 3D cube by Simon Speich (Testcube 3D) is an unusual discovery from an unusual source: an independent Swiss developer who devised a simple and quick test of DHTML 3D rendering while researching the origins of a bug in Firefox. That bug has been addressed already, but the test fulfills a useful function for us: It fills precisely the gap in our test suite that we've been needing to fill, testing only graphical dynamic HTML rendering -- which is finally becoming more important thanks to more capable JavaScript engines. And it's not weighted toward Mozilla -- it's a fair test of anyone's DHTML capabilities. There are two simple heats whose purpose is to draw an ordinary wireframe cube and rotate it in space, accounting for forward-facing surfaces.
Each heat produces a set of five results: total elapsed time, the amount of that time spent actually rendering the cube, the average time each loop takes during rendering, and the elapsed time in milliseconds of the fastest and slowest loop. We add those last two together to obtain a single average, which is compared with the other three times against scores in IE7 to yield a comparative index score.
The SlickSpeed CSS selectors test suite is a new and probably controversial addition to our suite, but as you'll see, we addressed the reason for the controversy and compensated for it. As JavaScript developers know, there are a multitude of third-party libraries in addition to the browser's native JS library, that enable browsers to access elements of a very detailed and intricate page (among other things). For our purposes, we've chosen a modified version of SlickSpeed by Llama Lab, which covers many more third-party libraries including Llama's own. This version tests no fewer than 56 shorthand methods that are supposed to be commonly supported by all JavaScript libraries, for accessing certain page elements. These methods are called CSS selectors (one of the tested libraries, called Spry, is supported by Adobe and documented here).
So Llama's version of the SlickSpeed battery tests 56 selectors from 10 libraries, including each browser's native JavaScript (which should follow prescribed Web standards). Multiple iterations of each selector are tested, and the final elapsed times are rendered. Here's the controversial part: Some have said the final times are meaningless because not every selector is supported by each browser; although SlickSpeed marks each selector that generates an error in bold black, the elapsed time for an error is usually only 1 ms, while a non-error is as high as 1000. We compensate for this by creating a scoring system that penalizes each error for 1/56 of the total, so only the good selectors are scored and the rest "get zeroes."

Here's where things get hairy: As some developers already know, IE7 got all zeroes for native JavaScript selectors. It's impossible to compare a good score against no score, so to fill the hole, we use the geometric mean of IE7's positive scores with all the other libraries, as the base number against which to compare the native JavaScript scores of the other browsers, including IE8. The times for each library are compared against IE7, with penalties assessed for each error (Firefox, for example, can generate 42 errors out of 560, for a penalty of 7.5%.) Then we assess the geometric mean, not the average, of each battery -- the reason we do this is because we're comparing the same functions for each library, not different categories of functions as with the other suites. Geometric means will account better for fluctuations and anomalies.
Nontroppo table rendering test. As has already been proven in the field, CSS is the better platform for rendering complex pages using magazine-style layout. Still, a great many of the world's Web pages continue to use HTML's old <TABLE> element (created to render data in formal tables) for dividing pages into grids. We heard from you that if IE7 is still important (it is our index browser after all), old-style table rendering should still be tested. And we've decided to concur.
The creator of our CSS rendering test has created a similar platform for testing not only how long it takes a browser to render a huge table, but how soon the individual cells (<TD> elements) of that table are available for manipulation. When the test starts, it times the duration until the browser starts rendering the table and then ends that rendering, from the same mark, for two index scores. It also times the loading of the page, for a third index score. Then we have it re-render the contents of the table five times, and average the time elapsed for each one, for a fourth score. The four items are then averaged together for a cumulative score.
Nontroppo standard browser load test. (That Nontroppo gets around, eh?) This may very well be the most generally boring test of the suite: It's an extremely ordinary page with ordinary illustrations, followed by a block full of nested <DIV> elements. But it allows us to take away all the variable elements and concentrate on straight rendering and relative load times, especially when we launch the page locally. It produces document load time, document plus image load times, DOM load times, and first access times, all of which are compared to IE7 and averaged.
Acid3 standards compliance test. The function of the Acid3 test has changed dramatically, especially as most of our browsers become fully compliant. IE7 only scored a 12% on the Acid3; but today, most of the alternative browsers are at 100% compliance, with Firefox at 93% and flirting with 94%. So it means less now than it did in earlier months to have Acid3 yield an index score of 8.33, which is the score for any browser that scores 100% thanks to IE7. Now that cumulative index scores are closer to 20, having an eight-and-a-third in the mix has become a deadweight rather than a reward.
So for the first time, we're making Acid3 count in a different way: For the other batteries that have to do with rendering (all three Nontroppos and TestCube 3D), plus the native JavaScript library portion of the SlickSpeed test, we're multiplying the index score by the Acid3 percentage. As a result, the amount of any non-compliance with the Web Standards Project's assessment is applied as a penalty against those rendering scores. Today, only Mozilla and Microsoft browsers are affected by this penalty, and Firefox only slightly -- all the others are unaffected.

The physical test platform we've chosen for our tests is a triple-boot system, which enables us to boot different Windows versions from the same large hard drive. Our platforms are Windows XP Professional SP3, Windows Vista Ultimate SP2, and Windows 7 Release Candidate.

All platforms are always brought up to date using the latest Windows updates from Microsoft, prior to testing. We realize, as some have told us, that this could alter the speed of the underlying platform. However, we expect real-world users to be making the same changes, rather than continuing to use unpatched and outdated software. Certainly the whole point of testing Web browsers on a continual basis is because folks want to know how Web browsers are evolving, and to what degree, on as close to real-time a scale as possible. When we update Vista, we re-test IE7 on that platform to ensure that all index scores are relative to the most recent available performance, even of that aging browser on that old platform.

The physical test unit is an Intel Core 2 Quad Q6600-based computer using a Gigabyte GA-965P-DS3 motherboard, an Nvidia 8600 GTS-series video card, 3 GB of DDR2 DRAM, and a 640 GB Seagate Barracuda 7200.11 hard drive (among others). Three Windows XP SP3, Vista SP2, and Windows 7 RC partitions are all on this drive. Since May 2009, we've been using a physical platform for browser testing, replacing the virtual test platforms we had been using up to that time. Although there are a few more steps required to manage testing on a physical platform, you've told us you believe the results of physical tests will be more reliable and accurate.

3 Comments

The Betanews Comprehensive Relative Performance Index: How it works and why

3 Responses to The Betanews Comprehensive Relative Performance Index: How it works and why

Recent Headlines

16 Billion Passwords Exposed: Major Leak Hits Apple, Facebook and Google Users

Windows Hello Facial Recognition Fails in the Dark After Update

Google to Kill Instant Apps. Wait, They Existed?

Website owners fear Google’s AI search, but is this concern reasonable?

Betanews Is Growing Alongside You

Will Windows 10 stop working? See if your PC will survive the switch to Windows 11

Chaos RAT malware strikes Linux and Windows as hackers exploit its flaws

Most Commented Stories

Betanews Is Growing Alongside You

Microsoft is ruining Notepad with pointless formatting in Windows 11

Only a fool still uses Windows 7

Will Windows 10 stop working? See if your PC will survive the switch to Windows 11

OpenAI ChatGPT o3 caught sabotaging shutdown in terrifying AI test

16 Billion Passwords Exposed: Major Leak Hits Apple, Facebook and Google Users

Fences 6.0 is the essential desktop upgrade for Windows 10 and 11 users -- get it today!

Microsoft is making huge changes to Windows 10 and 11, cutting out nagging to use Edge... for some