AMD Responds to Intel with its Own Teraflop Concept
Last month at the International Solid-State Circuits Conference, Intel showed off a kind of "concept CPU," the way automobile manufacturers at the Auto Show in Detroit parade their concept vehicles. Intel's model was observed to have performed at a throughput rate of one teraflop - one trillion calculations per second.
Never to be left outdone for long, AMD answered back yesterday at a press event in San Francisco, with a system design that, while not a single CPU, makes the case for "teraflop on a budget."
Rather than pile 80 cores onto a single chip, as Intel's design accomplishes, AMD's teraflop system is comprised of a pair of dual-core Opteron processors coupled with a pair of R600 Stream processors. If you're wondering why you haven't heard of AMD R600 processors, it's because you still might have some information to absorb on account of the ATI acquisition. R600 processors are essentially GPUs, but in this design, they're being co-opted for general-purpose processing.
The typical high-performance server, AMD points out, is capable of handling 100 billion floating-point operations per second - and high-performance servers do tend to be 2P units, if not 4P. One teraflop is ten times that, which opens up a dramatic new possibility: possible future server upgrades where adapting one or two GPUs is more cost-effective than adding more processors.
The roadblock to this ideal, however, is in the details: Applications that utilize GPUs for general-purpose power have to be compiled exclusively for that purpose; to this point, a GPU cannot be made to emulate a CPU, so that it takes over the job of another core. Such an emulation approach might not be practical anyway. However, both AMD and Intel have researched co-opting the GPU for CPU tasks since the 1990s; and nVidia recently jumped into the field by introducing its own C++ compiler for GPU parallelism.
AMD has yet to release many specific details about its concept system, including the precise model of Opterons used, and what other standard or non-standard components were employed on the motherboard. But one interesting detail was mentioned early: AMD's prototype runs Windows XP Professional.
The reason why this is interesting is because of recent statements AMD has made regarding Non-Uniform Memory Access - a principal part of AMD multicore design since its inception. NUMA is rooted in the concept that multiple cores need not queue up all their memory accesses in sequence along a single memory bus. AMD's HyperTransport acts as a bus between cores, which is what gave AMD its early performance advantage in its first iterations of multicore processors against Intel.
Since Intel's recent performance numbers have managed to wrest that advantage back, AMD has argued that tests that give Intel the advantage have been staged on Windows XP rather than Windows Vista, where improved NUMA drivers would make better use of memory handling, possibly helping AMD processors close the gap.
In a recent interview with AMD's senior performance analyst, Mark Welker, he told BetaNews that when an AMD Quad FX-based system runs on Vista rather than XP, newer drivers enable the system to take advantage of the fact that each processor in the Quad FX pair has its own memory controller - thus there are dual controllers in addition to dual cores. That, coupled with Vista's improved handling of NUMA, should help dual-processor systems exhibit what Welker described as a "leap" in performance. While Quad FX was introduced just recently, Opteron processors were among the first to support NUMA.
64-bit Vista, Welker said, would be even better than 32-bit Vista due to the former's improved handling of virtual memory address spaces for applications. "Vista is much better at NUMA - which is what a Quad FX platform architecture is - than XP was, and Vista 64 is better than Vista 32," said Welker. "Vista 64 understands the larger addressing space [so] you can actually easily address more than four gigs - when you put [in] 4 GB, you'll actually get 4 GB [instead of what 32-bit Vista calculates it to be]. When you get that extra width, each application has a larger space when it gets opened. It has a virtual space that gets much larger in Vista 64 than in any other OS.
"The more stuff that we pile into memory, and the more stuff that gets run at the same time, the better off it is to have that memory closer to the processor," added Welker, referring to AMD's on-chip memory controllers - a feature shared by both Quad FX and Opteron.
What's more, ATI was known to have been working directly with Microsoft, not only in the development of DirectX 10 graphics drivers for Vista, but also with regard to other projects including Windows Presentation Foundation. So it will be interesting to discover why AMD's teraflop system designers chose to go with XP, when its own engineers seemed to indicate that Vista could have presented significant improvements for just such a design.
Before Intel jumps at the chance, there's one final item to point out: Intel's 80-core concept system displayed last month was observed to run within a 98 W power envelope. We don't yet know which Opterons AMD's test system uses -- the company may yet tell us -- though its power envelopes range across-the-board from 33 W to 95 W.
Prototypes of the R600, meanwhile, has been observed to run within a 225-230 W envelope, though sources have been told that production editions of the chip for ATI DirectX 10 graphics cards could power down to 130 W. Still, that means the processing end of AMD's system, in a best-case scenario, powers down to about 326 W.
That's impressive by the standards of just two years ago. Yet should a production server ever be produced based on AMD's prototype specs, server buyers will perform cost/benefit analyses on systems whose maximum throughput is a teraflop (though when running specially compiled applications only) versus a cluster of 2P low-power Opteron blades. When they do, they may find themselves facing Arlo Guthrie's classic dilemma: whether one big pile is truly better than ten little ones.