AMD's K10: a "dead" product or not?
by Johan De Gelas on May 12, 2008 12:00 AM EST- Posted in
- IT Computing general
A few years ago it was fashionable to bash Intel's Pentium 4 as a braindead architecture. The fact that the Pentium 4 Northwood (533 MHz FSB) was the best performing processor from mid 2002 until late 2003 in many applications, and that the Pentium 4 Northwood remained competitive until early 2004 was conveniently forgotten: nuances do not make good headlines.
It is now trendy to bash AMD. One" PC doctor" at ZDNet goes as far to say that:
"When I look at AMD’s current product line, all I see is a forest of
deadness. Intel has products trump every category of products
going. Server, desktop, mobile, low-end, high-end, dual-core,
quad-core. Intel has all these markets stitched up."
Nuances, who needs them when you can make a sensational headline? And indeed, the lastest desktop CPU articles here at Anandtech show that Intel's midrange CPU have a significant lead over the fastest Phenom processors.
Like any design, the K10 is a trade-off. And most trade-offs were made in favor of the applications in the server and HPC market, at the expense of games and other desktop applications.
First take a look at this page which compares a Core 2 Duo 4400 (2 GHz, 2 MB L2 and 800 MHz FSB) with a slower 1.86 GHz Core 2 Duo E6320 (4 MB of L2 and a 1066 MHz FSB). One thing is for sure: games prefer the larger L2 cache. Some of the games were up to 10% faster on the CPU which was clocked 7% lower but with twice the L2-cache. The fact that games prefer a 4 MB L2 is not going to change when you run it on a AMD CPU with integrated memory controller. A L2 can deliver the necessary data in 12-20 cycles, an IMC needs about 100 cycles.
Now, take a look at the Cache architecture of AMD's K10/Barcelona. If your run a single threaded game on it, it gets a fast 512 KB L2-cache and after that a relatively slow (44-48 cycles!) 2MB L3. If you know that the same game can benefit from more than 2 MB cache, it is pretty clear that the 512 KB L2 is not going to cope, you'll end up using the L3 a lot. A dual threaded game might need a little less per thread, but the same problem will happen again: it needs to go to that slow L3 cache all too often. Run that same game on Intel Core CPU and each thread of your dual threaded game gets a low latency 4 MB (or 6 MB) L2.
Now let us now imagine that we run 4 threads of an HPC workload on it. Each thread has a very limited number of instructions, which perfectly fit in each of the L2 caches. You get 4 threads which gets a total of 4x the bandwidth of L2. In case of Intel, each two threads have to share the available bandwidth of the L2. The amount of data is huge, so caching the data is hardly possible. The fast IMC does wonders for the K10 chip.Data that is shared between the 4 cores remains in the L3-cache and all L2 caches are kept coherent at a incredibly fast SRI. So your cache coherency overhead does not increase with the number of caches, it increases per socket. Going from 2 to 4 sockets means that you double the amount of cache coherency traffic. Compare that to the Intel platform where all L2 caches need to be kept coherent.
It is just one example why we could never expect the K10 chip to be a super desktop chip. But how is Barcelona doing in the server world? Is it limited to an HPC niche market? Well, let us see what Intel thinks. First of all, where do most of the 45 nm chips go? Just a few weeks ago, Anand reported that Intel had no intention of flooding the desktop with 45 nm Core 2 chips quickly.
Those 45 nm chips are going to the server market. Why? Several reasons.
First of all, the server market might be only 20% of Intel's revenue. But look at this:
CPU
ASP
Profit margin (estimate)
Percentage of revenue Intel Server CPU >$400 >$300
+/- 20%
AMD Server CPU
$300-$400
$220-$330
+/- 16%
Intel Mobile/Desktop CPU
$100
$40-$50
+/- 80%
AMD Mobile/Desktop CPU
$50-65
$5-$30
>80%
Secondly, Intel needs those 45 nm to be competitive in the HPC market. A 2 GHz Barcelona is capable of keeping up with the best 65 nm Xeons in those applications.
It is pretty clear why AMD focused on the server market. Without a complete redesign it is not possible to beat Intel's integer crunching power and the fast and big L2-cache and that is exactly what a modern game needs. Barcelona built further on the K8 architecture and inherited the relatively inflexible integer pipeline. While Core 2 has sophisticated reordering of loads and stores, Barcelona does a limited reordering of loads. While Core 2 offers a 32 entry queue to the integer units, Barcelona has 3 rather inflexible separated 8 entry queues.
So the right way forward for AMD was to focus on HPC and server applications where it could leverage it's strong points. We can bash AMD for being so late, and coming up with relatively low clocked CPUs, but even a 2.8 GHz Phenom would not have raise AMD's ASP significantly in the desktop market.
We are almost done with our first round of quad socket benchmarking and we can tell you that we are having a lot more fun than Anand: it is a good old exciting fight between AMD and Intel. Don't believe us? Let Intel do the talking again:
Yes, projecting the bad performance of the desktop chip to say that "AMD's products are a dead forest" is ... just silly. If you have missed the previous entries of our IT blog, just go to it.anandtech.com
74 Comments
View All Comments
Angeloni100 - Wednesday, May 14, 2008 - link
Sure! I know that feeling behind the wheel all too well... But surelly it has something to do with performance, confort, handling, right?Can't get that feeling from a crapy car (not that AMD is a crapy product)...
All I was trying to say is that you can only get that feeling from something really good, worth the wait..
So, I am the kindda guy that dreams about the next big thing, for instance, I am putting money away to build me an SLI system... can't wait to have one... Specs:
Intel C2D 8500 (great for OCing)
Asus P5N-T Deluxe (Nvidia 780i)
2GB DDR2 1066Mhz Matched Pair Corsair Dominator
2X GeForce 8800GTS 512MB Alpha Dog Edition from XFX
1TB HD From Samsung or Seagate
Been dreaming of this system for a few months now...
But you see, no brand loyalty, just the best possible performance for my money... Thought about going crossfire, but there is no product that will give me this kind of perf. for this kind of money...
Last SLI system I had was a VooDoo 2 system with AMD K6-2 450 back in 98...
Anyway... I just wanted to make these fanboys think a little... thats all...
Thanks for listening
mathew7 - Tuesday, May 13, 2008 - link
Excuse me??? It was only the P4...one product with 3 revisions. And are you sure the chipset you're blaming is an Intel? Are you sure it was not a heat issue (because of the processor)?I've used many systems from 286, and I can tell you that in the P1 era I switched to Intel chipsets because they were 99% stable compared to 70-80% for VIA/SIS. Since then I bought/recommended exclusively Intel chipsets, except for nforce4 with an Athon64.
And the Northwood P4 processor was quite good. I admit Athlon64 to be better that P4 in all regards, but I only did this when I found more detailed information. And the only reason for me to bash Intel is for their marketing strategies, which is what kept Prescott alive so much time.
BTW: did you know that Athlon64 was AMD's 1st real success? Don't forget that AMD was a simple factory for Intel in the XT(8086) time.
JPForums - Tuesday, May 13, 2008 - link
The Athlon was AMD's first real success. However, many people don't know that the 386DX40 was AMD's first superior processor. Intel couldn't get the frequency up and thus moved on to the 486 line. They couldn't really fix the problem here either. Meanwhile, AMD got all the way up to the 486DX100 (not sure how to classify the X5-133). So Intel moved on to the Pentium which gave them a lead against the K5. They held this lead until the Athlon.The Athlon was the first processor AMD made that was based on its own platform. Super socket 7 was an incremental improvement of Intel's socket 7 platforms. It also beat the performance of the PII/PIII competition, though marginally. The Northwood core had a similar advantage over the Barton core as the the Athlon had over the PII/PIII. It wasn't until the K8 architecture (and later the Core2 architecture) that we started seeing massive advantages one way or the other.
Stability of platform, as you said, was the major concern. With that in mind, I believe Intel makes the best chipsets on the planet. nVidia and ATI have narrowed the gap into insignificance now, but Intel is by no means behind any other chipset for either Intel or AMD platforms. (Particularly in the area of stability)
Locutus465 - Tuesday, May 13, 2008 - link
Currently they're behind in both IGP performance and the ability to accelerate HD video content with the IGP... AMD is the current winner here, with nVidia set to shake things up a little.JPForums - Tuesday, May 13, 2008 - link
The nVidia and ATI(AMD) chipsets are both new. Wait for Intel's new IGP chipset to compare as it'll be out in the not too distant future.I do whole heartedly admit that Intels graphics performance is the craps, but if you want performance, you aren't going to use an IGP anyways. Also, businesses already seem to be under the assumption that Intel graphics is "good enough", so there is no real advantage there.
If Intels HD decode capabilities end up "good enough" for most people, then there will be no real change of power. However, if they botch it, then there will be an opening in the small HTPC market that Intel is less optimized to fill.
My main point was stability anyways. From that perspective, the new chipsets from nVidia and ATI are leagues behind (at least until they mature).
Locutus465 - Tuesday, May 13, 2008 - link
Are you bringing drivers into the picture as well here? AMD vista drivers have been rock solid where basically everyone else has been struggling. Finally things are settling out, but AMD seemed to have things right since day 1.JPForums - Tuesday, May 13, 2008 - link
I had overlooked Vista drivers.I had a poor experience with an Asus board based on the 780G chipset under Vista, but that is as far as my experience goes with IGPs undet the operating system.
As I've only dealt with one 780G based board and neither the latest nVidia nor the latest Intel IGP based chipsets, I have to concede that the 780G might be the most stable IGP on that platform.
However, I've had no trouble with X38/X48/P35 boards under Vista. I have had issues with some of nVidia's (non-IGP) offerings, and to a lesser extent, AMD/ATI's offerings. Nothing insurmountable though. As I said, the gap is insignificantly small.
TA152H - Tuesday, May 13, 2008 - link
The XT was 8088 based, and AMD was a licensed secound source for that and later processors (up to the 286), but it was not the only product they sold and they were not a factory of Intel's.AMDs 386s were considerably better than Intel's as well, and although they were out the same time the 486 was, they were very attractive chips.
Their 486s were also pretty popular, particularly as upgrades.
The K5 was a disaster, and the K6 kept them alive but little more. The K7, however, was a considerably better performing processor than the Pentium III, particularly the Katmai, mainly because it could run at significantly faster clock speeds and had a superior floating point unit.
The Athlon 64 was arguably one of AMD's failures, as it was too small of an increase in performance over the Athlon. It's not hindsight, I was stunned in a negative way at how poor a processor it was when it came out. The brainless masses didn't see it, because the Pentium 4 was a horrible chip, and the Prescott version was even worse. So, the nitwits thought AMD did a good job, but it was really poor and should have been obviously so, since Intel's mobile line always had great performance and low power use. It's just no one ever used them in the comparisons.
The Northwood being a good chip is a fallacy that people try to advance to show how much better they understand the market, and how overly simple most people are. The latter is true, but also saying the Northwood was so good is also an oversimplification. It was a huge, power hungry chip that was generally more powerful than the other company's much older design, but not always. It's the same argument I would use against the Athlon though, it used a lot more power, and was much bigger, but the performance advantage was more substantial since it could match the P III clock for clock (and greatly surpassed it in FP), and could also run at much higher clock speeds. The Northwood was enormous, and had miserable IPC, and didn't outperform the Athlon by very much.
In the end, the Pentium III design as it moved along the mobile route proved to be an excellent and balanced design, and is finally the dominant processor again in the current iteration of Penryn. The Athlon and Athlon 64 only looked good against the grotesque Pentium 4 line, which combined the twin virtues miserable IPC with huge size/power use. Why Intel ever used this chip, after seeing how good the Pentium Ms were, is a mystery to me.
JPForums - Tuesday, May 13, 2008 - link
So, a 10% - 25% (Averaged about 16% in my experience) improvement over K7 (at the same frequency) when comparing the K8 with a single (64bit) memory controller is a failure. Add an extra 5% for the dual (128bit) memory controller. By that metric, the only success in the industry was the Core2 and only when you compare it to the netburst architecture that you state was horrible. I suppose K6 to K7 might have been considered successful in certain areas as well.I agree that the Athlon64s got more credit than they should have due to the underwhelming performance of the P4s, but having made the switch from an AthlonXP 3200+ (2.2GHz) to a socket 754 Athlon64 3200+ (2.2GHz), I just can't understand where you are coming from.
I can think of a host of architectures releases that were less impressive. My switch from PII to PIII for instance was much less impressive. Athlon to AthlonXP wasn't exactly impressive either. Going to the Athlon from a K6-III seems to be the only switch short of the P4 to Core2 switch that even compares. I'm not even sure the difference is that large when you compare Core2 to Core (PentiumM). The Athlon64 architecture didn't end at double the clock rate that the AthlonXP did, but you said you felt that way at launch. Most new architecture releases aren't launched with clock speeds that much higher than the previous ones anyway. The P4, as you put it, was horrible anyways.
If you look at the server/workstation market (the opteron launched first) then the launch is more impressive. The only place I see that the K8 architecture disappointed was mobile use. I'll concede your point in the mobile space.
Oh, the reason they didn't use the PentiumMs (Core architecture) in the desktop realm was they couldn't get clock speeds high enough to match the performance of the Athlon64s. The best overclocking results that I saw from Dothan was about 2.6GHz (yes, a few manufacturers actually did make desktop boards for Pentium Ms). They would have release retail versions no higher than 2.4GHz in the desktop space and probably only as an extreme edition processor. This puts them on par with an Athlon64 4000+ (2.6GHz). At that point, their P4 lineup performed better and nobody seemed to care that they could fry eggs on it.
Locutus465 - Tuesday, May 13, 2008 - link
I really have to disagree with your assesment of K8... Fact is while it wasn't significantly better clock for clock in 32bit mode it added in a wildly successfull 64bit mode that oblitereated Intel's Itanic asperations and very quickly ushered in an era of 64bit computing for the "Wintel" world. If it wasn't for K8 we'd almost certainly still be living in a 32bit world right now very quickly running up against a wall...If you really want to try to bash AMD for a CPU design why not bash K7 for being slower (clock for clock) than the K6-III with regard to integer perforance. In fact to date I think the K6-III remains one of the fastest x86 CPU's for integer performance. Personally I'm ok with this since they've more than made up for that with much higher clocks, vastly improved FPU and a host of other innovations.
What I'm looking forward to now are AMD's platform innovations, they're doing a really good job in this arena, which is actually probably part of the reason why CPU perforamce is lagging a little. Good things are coming, if AMD can hold long enough to make spider a mature platform I'm sure they'll revisit K archetechture and speed things up quite a bit.