Barcelona: AMD Gambles on an Evolutionary, Not Revolutionary, CPU

Barcelona Issue #2: Were All AMD's Design Choices 'Smarter?'
Two years ago, when the market judged a number of Intel's design decisions as bad (case in point: hyperthreading), AMD held the price/performance lead. It could take Intel to task publicly for the deficiencies of its NetBurst architecture - deficiencies Intel would soon eradicate.
Now with the performance margins between the two companies so close, Intel may take this opportunity to do some architecture bashing of its own. AMD, we discovered, is prepared to defend its design choices, including both the recent ones and those it made back in 2002 when its x86 multicore architecture was first launched.
DDR2, not DDR3. One very noticeable difference distinguishing AMD's quad-core platform approach from Intel's is how Opteron has refrained from jumping onto the DDR3 memory bandwagon. DDR3 would double memory bandwidth from DDR2's 800 Mbps to 1600 Mbps (although DDR2-1066 specifications are forthcoming, and AMD has plans to support it).
It also reduces voltage along the rail from 1.8 V to 1.5 V, which is a reduction you'd think AMD would want. Intel is banking on customers' willingness to amortize all their DDR2 memory investments early, if necessary, to take advantage of the speed boost and power efficiency.
But lab tests have shown that boost to be somewhat less than double. In fact, the latencies introduced in DDR3 to compensate for slower system busses is so palpable that, especially for older chipsets, any speed difference you might see is negligible, and for some applications, there might not be a difference at all.
Last May, when AMD announced the development of its quad-core Phenom series (with much the same architecture as quad-core Opteron), BetaNews asked product manager Ian McNaughton why his company wasn't pushing towards DDR3 as fast as Intel. The answer, he said, was that skeptical customers had seen the early test results and weren't buying into it.
"The reason why our competitors want to go to DDR3 so quickly is because they need that speed hit, the frequency hit in their front-side bus," McNaughton told us then, "to be able to not fall behind from a performance perspective any further. So they're going to try to push the market into DDR3 when the market doesn't want to go there, because there's no performance benefit to go to DDR3 over DDR2 that we can really ascertain at this point in time."
John Fruehe told us AMD does plan on supporting DDR3...just not now. For his customers, the reason is a little different, more to do with differences you can measure rather than those you can feel.
"In addition to wanting to make sure we've got a stable platform, the worst thing in the world that you can do is force your customers into making a memory transition early on in the cycle," Fruehe remarked. "Because you're probably aware, memory has that bathtub effect in pricing. It's high at the beginning, you hit critical mass, it drops back down again and it's nice and low; you get to end-of-life and it springs back up again. Well, the worst thing you could ever do is force a customer into a memory transition at the front end of that curve, because you're forcing them to buy the expensive memory."
In other words, it's a matter of timing. Hardware analysis firm iSuppli believes DDR3 will comprise about 25% of all DRAM shipments in 2008. With that level of demand, prices will probably sink to the rim of the basin, and that's where AMD will likely pick it up - one architectural generation following mid-2008's 45 nm Shanghai, for what the company is currently calling "Sandtiger" - its first octal-core server CPU line.
No fully-buffered DIMMs. Ever since it started cranking out multicore server processors, Intel has been investing in more expensive, fully-buffered DIMMs (FB-DIMM) for reasons it attributes to server reliability and integrity. Intel's boilerplate description of the technology reads like this: "Fully-buffered dual in-line memory technology allows for better memory capacity, throughput and overall reliability. This is critical for creating balanced platforms using multiple cores and the latest technologies, such as virtualization, to meet the expanding demand for compute headroom."
It's a familiar Intel strategy: one new innovation drives the need for the other, which drives the demand for a third, hopefully leading to an increase in demand for the first innovation in the loop.
But AMD is having none of it, and in fact is trying to write it off as another wasted innovation for a cheap performance hit, like hyperthreading. John Fruehe: "I think we all saw how well FB-DIMM really panned out in terms of memory. It was expensive, it was hot, it drew a lot of power and it was harder to get. Other than that, it was probably a great idea. But the problem with fully-buffered DIMMS is, it never really panned out as a relevant memory technology.
"If you look at the performance of my platform," he continued, "the fact that I have an integrated memory controller means memory latency and memory throughput are going to be so much further ahead for our platform, relative to our competitor, that you can't even compare it. It's an order of magnitude better."
The 8 MB L3 cache. When Intel added a colossal 16 MB cache to its Tulsa line of Xeon MP processors last year, AMD chocked it up as another "fake" innovation for a cheap performance hit. Intel needed all that L3, it argued, to compensate for the latencies in its memory timing introduced by an off-chip memory controller that had to marshal between two sandwiched-together CPU dies.
Now, AMD is adding a shared L3 cache, while Tigerton does away with its own. Barcelona's is not as big as Tulsa's, but it's still fairly large, and it's on top of four 512 KB dedicated L2 caches for each core. If Intel's Tulsa L3 was a cover-up, could AMD be trying to hide something...perhaps half as problematic?
John Fruehe very skillfully explained Barcelona's L3 cache not as something that makes up for an AMD deficiency, but for one of Intel's. "What happens is, if you've got two dies there," he explained, referring to Intel's double-dual-core sandwich, "each has two processor cores in it. If one of the processor cores in the first die needs data that's sitting in the cache of the second die, all that communication has to go outside the chip and through the external memory controller to be able to pass that data back and forth. Whereas in the Opteron world, because we've got that integrated cache that's shared and you've got four processor cores all on the same die, and that crossbar switch, what you get is communication between the cores that actually happens all within the die. And you get much better scalability."
Barcelona Issue #3: Can AMD still claim a power advantage?
This is an extremely important issue for AMD going forward, which we'll cover in a separate article today. The problem is this: Intel has, by all accounts, evened the playing field with regard to power. It's currently offering Xeon MP processors with lower TDP ratings than AMD, and will continue to do so after even the low-power Barcelona CPUs are released.
AMD has to try to minimize the impact of the damage, so starting today, it will start utilizing a new metric called "Average CPU Power" - initially alongside TDP, soon in place of it - whose values will be somewhat lower than Intel's TDP numbers. Long-time AMD builders may be reminded of the time when it classified its Athlon series using performance numbers that were relative to Intel Pentium clock speeds, even though their own frequencies were slightly lower. That trick worked in the consumer market for a time, but the unanswered question is how long a similar "re-scaling" technique will work in the server market, where measurement is everything?
Next: Barcelona Issue #4: Will four cores make that much difference?