Betanews Relative Performance Index for browsers 3.0: How it works and why

The newest member of the suite

Decades ago, back when I tested the efficiency of BASIC and C compilers, I published the results under my old pseudonym, "D. F. Scott." In honor of my former "me," I named our latest benchmark test battery "DFScale."

It has ten components, two of which are broken into two sub-components, for a total of 12 tests. Each component renders results on a grid which is divided into numbers of total iterations: 20, 22, 24, 30, 100, 250, 500, 1000, 10000, 25000, 50000, 100000, 250000, 500000, 1000000, and 10000000. Not every component can go that high. By that, I mean that at high levels, the browser can crash, or hang, or even generate stack overflow errors. Until I am satisfied that these errors are not exploitable for malicious purposes, I will not release the DFScale suite to the public.

On the low end of the scale, I scale some components up where the iteration numbers are too low to make meaningful judgments. So for a test whose difficulty scales exponentially -- and rapidly -- like the Fibonacci sequence, for instance, I stick to the low end of the scale.

Some tests involve the use of arrays of random integers, whose size and scale equally correspond to the grid numbers. So an array of 10,000 units, for example, would contain random values from 1 to 10,000. Yes, we use the fairer random number generator and value shuffler used by Microsoft to correct its browser choice screen, in an incident brought to the public's attention in early March by IBM's Rob Weir. In fact, the shuffler is one of our tests; and in honor of Weir, we named the test the "Rob Weir Shuffle."

To ensure fairness, we randomize and shuffle the arrays that components will use (except for the separate test of the "Weir Shuffle") at the start of runtime. Then components will sort through copies of the same shuffled arrays, rather than regenerate new ones.

Reiterative loops run uninterrupted on each browser, which is something that some browsers don't like. Internet Explorer stops "runaway" loops to ask the user if she wishes to continue; to keep this from happening, we tweaked the System Registry. Chrome puts up a warning giving the user the option to stop the loop, but that warning is on a separate thread which apparently does not slow down the loop.

Each component of DFScale produces a pair of scores, for speed and scalability, which require Excel for us to tally. The speed score is the average of the iterations per second for each grid number. The scalability score is a product of two elements: First, the raw times for each grid number are plotted geometrically on the y-axis, with the numbers of iterations on the x-axis. We then estimate a best-fit line, and calculate the slope of that line. The lower the slope, the higher the score. Secondly, we estimate a best-fit exponential curve, representing the amount of acceleration a JIT compiler can provide over time. Again, the lower the curve's coefficient, the flatter the curve is, and the higher the score.

Rather than attribute arbitrary numbers to that score, we compare the slope and coefficient values to those generated from a test of IE7 in Vista SP2. Otherwise, saying that browser A scored a 3 and browser B scored a 12, would be meaningless -- twelve of what? So the final score on a component is 50% slope, 50% curve, relative to IE7.

The ten tests in the (current) DFScale suite are as follows:

  • The "Rob Weir Shuffle" - or rather, the fairer array shuffling algorithm made popular by Weir's disassembly of the standard random number generator in IE.
  • The Sieve of Eratosthenes (which I used so often during the '80s that several readers suggested it be officially renamed "The D. F. Sieve"). It's the algorithm used to identify prime numbers in a long sequence; and here, the sequence lengths are determined by the grid numbers.
  • The Fibonacci sequence, which is an algorithm that produces integers that are each the sum of the previous two. Of the many ways of expressing this algorithm, we picked two with somewhat different construction. The first compresses the entire comparison operation into a single instruction, using an alternative syntax first introduced in C. The second breaks down the operation into multiple instructions -- and because it's longer, you'd think it would be harder to run. It's not, because the alternative syntax ends up being more difficult for JIT compilers to digest, which is why the longer form has been dubbed the "Fast Fib."
  • The native JavaScript sort() method, which isn't really a JavaScript algorithm at all, but a basic test of each browser's ability to sort an array using native code. In the real world, the sort() method is the typical way developers will handle large arrays; they don't include their own QuickSort or Heap Sort scripts. The reason we test these other sorting algorithms (to be explained further) is to examine the different ways JavaScript interpreters handle and break down reiterative tasks. The native method can conceivably act as a benchmark for relative efficiency; and amazingly, external scripts can be faster than the native method, especially in Google Chrome.
  • Radix Sort is an algorithm that simulates the actions of an old IBM punch-card sorter, which sorted a stack of cards based on their 0 - 9 digits at various locations. No one would dare use this algorithm today for real sorting, but it does test each browser's ability to scale up a fixed and predictable set of rules, when the complexity of the problem scales exponentially. Nearly all browsers are somewhat slower with the Radix at larger values, with IE8 dramatically slower.
  • QuickSort is the time-tested "divide-and-conquer" approach to sorting lists or arrays, that often remains the algorithm of choice for large indexes. It picks a random array value, called a "pivot point," and then shuffles the others to one side or the other based on their relative value. The result are groups of lower and higher values that are separated from one another, and can in turn be sorted as smaller groups.
  • Merge Sort is a more complex approach to divide-and-conquer that is often used for sorting objects with properties rather than just arrays. It efficiently divides a complex array into smaller arrays within a binary tree, sorts the smaller ones, and then integrates the results with bigger ones as it goes along. It's also very memory intensive, because it breaks up the problem all over the place. (If the video below doesn't explain it for you, then at least it will give you something to temper your nightmares.)
  • Bubble Sort is the quintessential test of the simplest, if least efficient, methodology for attacking a problem: It compares two adjacent values in the array repetitively, and keeps moving the highest one further forward...until it can't do that anymore, at which point, it declares the array sorted. As a program, it should break down extremely simply, which should give interpreters an excuse to find that extra gear...if it has one. If it doesn't, the sort times will scale very exponentially as the problem size scales linearly.
  • Heap Sort has been said to be the most stable performer of all the sort algorithms. Naturally, this is the one that gives our browsers the most fits -- the stack overflow errors happen here. Safari handles the error condition by simply refusing to crash, but ending the loop -- which speaks well for its security. It cannot perform this test at ten million iterations. Other browsers...can force a reboot if the problem size is too high. Essentially, its task is to weed out high values from a shrinking array. The weeds are piled atop a heap, and the pile forms a sorted array.
  • Euler's Problem #14 is a wonderful discovery I made while searching for a test that was similar in some ways to, but different in others from, the Fibonacci sequence. The page where I discovered it grabbed my attention for its headline, "Yet Another Meaningless JavaScript Benchmark," which somehow touched my soul. The challenge for this problem was to find a way of demonstrating the "unproven conjecture" that a particular chain of numbers, starting with any given integer value and following only two rules, would eventually pare down to 1. The rules are: 1) if the value is even, the next value in the chain is the current one divided by 2; 2) if the value is odd, the next one is equal to three times the current one, plus 1. Our test runs the conjecture with the first 1000, 10000, 25000 integers, and so on. The problem should scale almost linearly, giving interpreters the opportunity to push the envelope and accelerate. In our first test of the IE9 Tech Preview, it appeared to be accelerating very well, running the first 250,000 integer sequence in 0.794 seconds versus 10.9 seconds for IE8. Alas, IE9 is not yet stable enough to run the test with higher values.

    Problem 14 is also interesting for us because we test it two ways, suggested by blogger and developer Motti Kanzkron: He realized that at some point, future sequences would always touch upon a digit used in previous sequences, whose successive values in the chain would always be the same. So there's no need to complete the rest of the sequence if it's already been done once; his optimization borrows memory to tack old sequences onto new ones and move forward. We compare the brute-force "naïve" version to the smart "optimized" version, to judge the ability of a JIT compiler to optimize code compared to an optimization that would be more obvious to a programmer.

Next: The other third-party test batteries...

34 Responses to Betanews Relative Performance Index for browsers 3.0: How it works and why

© 1998-2025 BetaNews, Inc. All Rights Reserved. About Us - Privacy Policy - Cookie Policy - Sitemap.