The winner by knockout: Los Alamos claims the Top 500 throne
The petaflop barrier was not only broken last week, it was pulverized into infinitesimally small, neutrino-sized particles. While Intel continues to blow even more horns, suddenly it's the Cell processor that has engineers talking.
When in November 2003, the Oklahoma Sooners football team blew out Texas A&M by a score of 77 - 0, a sportscaster was heard to have said, "It wasn't that close." When the news arrived this morning from Mannheim -- after a few days delay, apparently to celebrate -- of the absolute trouncing of the once unstoppable IBM BlueGene/L supercomputer by, quite literally, a hybrid collection of AMD Opterons and parts you'd find in a PlayStation 3 -- not a vanquishing, not a clobbering, but a mathematical and systematic decimation of the former champion by 231% -- it was the type of blowout that the late, great Jim McKay would have loved to have described, up close and personal.
As it turned out, Los Alamos National Laboratories' new Roadrunner system didn't really have a lot of competition after all. The race to break the petaflop barrier, to compute one quadrillion floating point operations per second, wasn't even a race. Roadrunner broke away from the pack, and not even Chuck Jones could have adequately caricaturized its competition.
Here's how we describe it: Twice each year, the rankings of 500 of the world's supercomputers are assessed by the University of Mannheim in association with Berkeley National Laboratory and the University of Tennessee, Knoxville. Those assessments use the industry standard Linpack benchmark. Supercomputers' scores are sorted by tested clusters' maximal observed peak performance, in gigaflops (GFlops, or billions of floating-point operations per second). This performance is called the "Rmax rating," although Mannheim does publish theoretical peak performance ("Rpeak") as a comparison, representing how fast the system architects believe each system could or should perform. Dividing Rmax by Rpeak rating produces a yield ranking, which represents how well each system is performing to engineers' expectations.
The Roadrunner supercomputer at Los Alamos National Laboratories, the first to break the petaflop processing barrier. |
LANL's Roadrunner system eclipsed the petaflop barrier, as was announced last week (it's hard for these government laboratories to keep secrets nowadays). The Rmax score for Roadrunner was 1,026,000 GFlops (over a million, billion operations). And with an Rpeak score of 1,375,780, that means the new system was cruising along at a comfortable 75% yield.
BlueGene/L, we learned this morning, never turned in a faster score than Lawrence Livermore Labs posted last November, so it stays #2 with an Rmax of 478,200. And while BlueGene/L has 212,992 processors, Roadrunner gets away with just 122,400. That substance those UC engineers are noticing for the first time this morning, is dust.
Though Roadrunner is a hybrid, Mannheim U. officially classifies it as a Power processor-based system, probably because it's built by IBM. This season, Power CPUs including PowerPC and Cell processors powered 68 of the Top 500, and five of the top 10 including #1, #2, #3, #6, and #9. IBM's BlueGene supercomputer architecture still holds 7 of the top 20 slots, though Roadrunner is not a BlueGene -- it's based instead on a new schematic of BladeCenter clusters.
There are two other CPU manufacturers in the world besides IBM, in case you've forgotten. Last November, Intel captured 354 slots in the Top 500; this season, Intel is still surging ahead with 375. An incredible 356 of those use 64-bit x86 architecture CPUs (EM64T), with 64-bit Itaniums (IA-64) dwindling down to just 16, and 32-bit Itaniums down to 3.
Intel's fastest performer this season slipped down several notches, though, to #7. It was last November's #3 performer, an Altix ICE 8200 cluster built for the New Mexico Computing Applications Center (what is it these days with New Mexico?), and it even sped up somewhat with an Rmax of 133,200. But only six Intel-based systems cracked the top 25 slots this time around.
What's left for AMD? Well, it's claiming at least partial share of this season's crown, with AMD Opterons populating roughly half of Roadrunner (the precise Cell-to-Opteron split hasn't been nailed down publicly). As far as systems populated entirely by Opterons, AMD can claim the #4 slot: that SunBlade x6420 cluster built for the Texas Advanced Computing Center using quad-core Opterons, that Sun had hoped would finally put away IBM, beating it to the petaflop barrier.
But how bad was the damage? That SunBlade cluster barely made 326,000, at about 65% yield. If its architects really thought it could go faster, that yield number would have been much lower. The news gets worse for AMD, which slipped from 113 systems on the list to a measly 55. Yes, there are now more Power-powered systems on the list than Opteron-powered.
What's everybody running software-wise these days? That is also becoming a non-contest, as Linux has pervaded 458 of the Top 500 systems. Microsoft had wanted to make a splash at this year's supercomputer conference this week, but that's going to be a little hard with only 5 systems on the big list. The fastest Windows-powered computer this time around was indeed faster, and a little bigger: Coming in at #20, it's a Dell PowerEdge 1955 cluster built for the NCSA. With 9,600 processors, its score was 68,480, which only illustrates the orders of magnitude difference between the bottom of the top 20 and the top.
And there's always someone who wants to know, what's the fastest Mac on the list? There were two this time, the fastest being that XServe cluster built for COLSA, though it hasn't gotten any faster than November, and this season slipped down to #141.