First Cell-Based Computer Announced
Late yesterday, IBM announced it is finally making available the first general purpose computing system to utilize the Cell processor. Proving the Cell is not just for game consoles, IBM is infusing its high-performance System Cluster 1350 setup with a Cell-based BladeCenter QS20 option.
It's being marketed as a device for "compute-intensive" operations, which confirms expectations that Cell would be introduced on the high end, and touted for its number-crunching ability. Each QS20 blade will feature a pair of Cells, each of which is what the STI coalition -- Sony, Toshiba, and IBM -- describes as a "multi-element" processor, rather than "multicore."
A single processor (in this setup) includes one element that's essentially a dressed-up Power or PowerPC. Its job is to analyze the task at hand, then identify and isolate repetitive portions that best lend themselves to parallel operation. (Most compute-intensive operations, it turns out, are reiterative.) Those tasks are then delegated by the PPE to up to eight so-called "synergistic processing elements" (SPEs), which in a sense consume these partly-digested tasks produced for them by the PPE.
The process is a lot more similar to the delegation of tasks in a graphics processor than in an Intel or AMD multicore processor, although the PPE/SPE relationship in a Cell is much more broadly defined. When Cell processors are multiplexed, the PPEs are engineered so that they can work together so they can actually layer the delegation of tasks among successive tiers of SPEs.
In other words, a Cell can break down tasks, then break them down again if more SPEs are available. As a result, engineers have found, the efficiency of a Cell-based system can rise more exponentially than linearly, with the more SPEs there are available. More accurately, Cell systems may be less susceptible to efficiency drop-offs as processor size increases, though the true test of that theory comes now.
"Increasing frequencies and deeper pipelines have reached diminishing returns on performance due to issues with power consumption/dissipation and memory latencies," IBM said on Thursday. "The QS20 addresses this problem head-on with two 3.2 GHz Cell BE processors on the blade."
Incidentally, this is the same clock speed as will be used for Sony’s PlayStation 3. Each PPE has 512 KB of L2 cache, but each SPE has 256 KB of what STI describes as "local store memory," which is part of a unique, three-tiered memory structure that may get its first serious test with the QS20. With 256 KB all to itself, each SPE operates as a little, self-contained computer; and since it only has to deal with user application-oriented tasks and never with the operating system, IBM engineers say they can concentrate those efficiency benefits on those tasks the user actually sees. The PS3 reportedly only uses seven of the eight SPEs available, reserving #8 as a spare.
A little phrase that IBM engineers use to benchmark efficiency gains is “Gelsinger’s Law,” referring to Intel Senior Vice President Pat Gelsinger. It was Gelsinger who pointed out that overall throughput increases by 40 percent every time the number of processors in a system actually does double, in accordance with Moore’s Law.
IBM uses this benchmark as a sort of tease, to prove the Cell can do better. Soon, we’ll be able to find out for ourselves, as we finally see how a Cell system performs against Xeon and Opteron in the same environment.