With the petaflop barrier broken, is it time to change the benchmark?
The Roadrunner supercomputer now runs more than 1.1 thousand trillion floating-point operations per second. But what's an "operation" really? By the time the next Top 500 list comes out, the definition could change.
At a presentation at the SC08 semi-annual supercomputing conference in Austin, Texas, an engineer with Oak Ridge National Laboratories in Tennessee who is an expert on the Linpack benchmark, suggested that the methodology used to determine supercomputer performance using Linmark may be behind the times. Specifically, Jack Dongarra -- the man credited with introducing the High-Performance Linpack (HPL) benchmark to the Top 500 program -- suggested that as supercomputers get bigger and can store more data, their lag times increase exponentially. This implies that making existing supercomputers bigger and faster eventually leads to a point of diminishing returns.
In the presentation (PDF available here), Dongarra states up front that it's only natural to test a supercomputer cluster with a problem size that's proportionate to its capacity. HPL expresses problem size in orders of 10; whereas desktop computers may be tested using Linpack set with a problem size of 1,000, the problem or matrix size for supercomputers is set into the millions. The #1 contender in this week's Top 500, Los Alamos National Laboratories' IBM Roadrunner, ran the HPL with a problem size (n) of 2,300,000. It was able to perform the HPL in about two hours. Oak Ridge had the #8 and #2 contenders, both of them Cray XTs and the #2 player -- Jaguar -- also beating the petaflop barrier this year.
Jaguar has more memory available to it than Roadrunner. So in one test, Oak Ridge increased the problem size proportionate to its capacity, for n of 4,700,000 (4.7 x 106). As Dongarra reported, that test took Jaguar 16 hours to complete.
At the rate at which supercomputers are presently scaling, he told attendees, it should be perfectly reasonable to adjust the problem size matrix to 33,500,000 by 2012. That would only seem fair, to scale the problem size to the extent that the clusters are scaling. But with current architectures, applying the current rate of performance falloff, the completion time could actually plummet, Dongarra predicted, to over two and a half days.
What's the solution? Dongarra's presentation showed how performance declines over the period of the entire benchmark's run, which it's supposed to do. Conceivably if a certain segment of the run were captured prior to the big dropoff late in the run, then a formula for the dropoff could be estimated based on the graph. That way, the test won't take days for each iteration...or conceivably even weeks.
But implementing the solution may mean changing the benchmark, and quite possibly impacting the results. This from a guy who works for the laboratory whose best contender placed second.
Dongarra's presentation closed with the following: "We are planning on making changes and will probably be ready after ISC in Hamburg." That's the very next supercomputing conference, slated for June 23 in Hamburg, Germany, and that's when the next Top 500 list will be published. At that time, we could see whether changing the way we measure tasks that scale up or down with their respective clusters, changes in turn the way we perceive supercomputing.