Er, gosh. Dunno what to make of the preceding discussion. Eh, they don't scare me--I'll post anyway.
Even though the title of this article is "Quad core for the masses", the benchmark is for enterprise database applications. Because of the title, I had expected some workstation benchmarking. Any plans for doing benchmarks for scientific and visualization applications? From bio-tech (BLAST, etc.), to fluid dynamics, to 3D rendering. That sort of thing.
Intel's attempt to use two dual cores on a slice of silicon and call it a quad core shows how easily they can manipulate the media with foolishness. Only a fool would buy Intel's inferior 2+2 design when they can have Barcelona and it's many superior derivatives.
Riiight... only a fool would get QX6700 right now when Barcelona isn't out. Two chips in a package has disadvantages, but there are certainly instances where it will easily outperform the 2x2 Opteron, even in eight-way configurations. There are applications that are not entirely I/O bound, or bandwidth bound. When it comes down to the CPU cores, Core 2 is significantly faster than any Opteron right now.
As an example, a 2.66 GHz Clovertown (let alone a 3.0 GHz Xeon) as part of a 3D rendering farm is going to be a lot better than two 2.8 GHz (or 3.0 GHz...) Opteron parts. Two Xeon 5355 will also be better than four Opteron 8220 in that specific instance, I'm quite sure. The reason is the 4MB per chip L2 is generally enough for 3D rendering. There are certainly other applications where this is the case, but whether they occur more than the other way (i.e. 4x2 Opteron being faster than 2x4 Xeon) I couldn't say.
AMD isn't really going to have a huge advantage because of native quad core with Barcelona, and Intel wouldn't get a huge boost by having native quad core either. If you thought about it more, you would realize that the real reason Intel's quad core chips have issues with some applications is that all four cores are pulling data over a single FSB connection - one connection per socket. Intel has to use that single FSB link for RAM, Northbridge, and inter-CPU communications.
In contrast AMD's "native quad core" will have to have all four cores go over the same link for RAM access (potential bottleneck). They can use another HT link to talk to another socket (actually two links), and they can use the third HT link to talk to the Northbridge. The inter-CPU communication generally isn't a big deal, and Northbridge I/O is also a much smaller piece of the bandwidth pie than RAM accesses. It's just that AMD gets all the RAM bandwidth possible. AMD could have done a "two die in one package" design and likely had better scaling than Intel, but they chose not to.
And of course Intel will be going to something similar to HyperTransport with Nehalem in 2008. Even they recognize that the single FSB solution is getting to be severely inadequate for many applications.
quote: As an example, a 2.66 GHz Clovertown (let alone a 3.0 GHz Xeon) as part of a 3D rendering farm is going to be a lot better than two 2.8 GHz (or 3.0 GHz...) Opteron parts. Two Xeon 5355 will also be better than four Opteron 8220 in that specific instance, I'm quite sure
Actually, that's not true Jarred.
http://www.anandtech.com/showdoc.aspx?i=2897&p...">Johan's test benchmarked exactly that scenario, and C2D was equal at 4 cores and slightly slower at 8 cores. This was a 2.33 GHz Clovertown vs the 2.4 GHz Opterons...
Let me add that there are cases where it could be true, but only when the apps don't scale at all...and in that case, even a single or dual core sometimes beats the Clovertowns.
Okay, wrong example then. Heh. The point is I am sure there are benchmarks where the FSB bottleneck isn't as pronounced. Anything that can stay mostly within the CPU cache will be very happy with the current Xeon 53xx chips. Obviously, the decision as to what is important will be the deciding factor, so companies should research their application needs first and foremost.
Getting back to the main point of the whole article, clearly there are areas where Opteron can outperform Xeon with an equal number of cores. Frankly, I doubt 1600 FSB is going to really help, hence the need for the new high speed link with Nehalem on the part of Intel. K10 could very well end out substantially ahead in dual and quad socket configurations later this year, even if it only runs at 2.3 GHz. I guess we'll have to wait and see... for all we know, the current memory interface on AMD might not actually be able to manage feeding quad cores any better than Intel's FSB does.
quote: The point is I am sure there are benchmarks where the FSB bottleneck isn't as pronounced. Anything that can stay mostly within the CPU cache will be very happy with the current Xeon 53xx chips
Actually, it appears (at least from the stuff I've seen so far) that the only apps that aren't effected by the bottleneck are the ones that are just as good on a dual core...in other words they don't scale well.
I agree with the AMD exec who intimated that AMD made a HUGE mistake in not coming out with an MCM quad chip in November...I think that the benches would have been nicely into the Opteron side of things well before Barcelona, but of course only on the quad chip.
quote: I doubt 1600 FSB is going to really help, hence the need for the new high speed link with Nehalem on the part of Intel
I absolutely agree...I've been saying for the last year that AMD will most likely retake the lead again (even against Penryn), but that Nehalem is a whole nother ballgame...
quote: for all we know, the current memory interface on AMD might not actually be able to manage feeding quad cores any better than Intel's FSB does
I suppose that's possible, but if it were true then I think every executive at AMD would have dumped all of thier shares by now. :)
That's just as valid as saying it's possible that there's a flaw in Penryn when it gets over 2.8 GHz...possible, but I strongly doubt it.
I'm not sure why you guys don't think an increase in FSB and memory bandwidth (i.e. 1600) isn't going to help. It's seems beyond obvious it will. Will it help enough is the only question.
With regards to the 2+2 from Intel, why does anyone really care? In some ways it's better than a true four in that you can clock them higher because you can pick pairs that make the grade, instead of hoping that all four core can clock really high. If one of the four can't, well, the whole thing has to be degraded. With Intel's approach, if one set of the cores is not capable at a certain speed, you just match it with one that is fairly close to it and sell it like that. It allows them to clock higher, and sell them less expensively than they would if they made a big quad-core die. The performance is excellent too, so it's a pretty good solution.
Why would AMD not have problems with Quad-Cores similar to Intel? You still have four cores sucking data through one memory bus, right? Or am I missing something? Is AMD going to have a memory bus for each core? That seems strange to me, so I'm going to assume they are not. The memory controller and point to point bus don't fundamentally change that problem. This comparison was fairly grotesque in that it made the memory subsystem for the Opteron seem better than it was. You had eight processors, yes, but only two cores since you had four sockets and only two cores were fighting for the same bus since you had a point to point. That's the advantage. If you have more sockets, the AMD solution will scale better, although NUMA has horrible penalties when you leave the processors own memory. If you add more processors to the same socket, you still have fundamentally the same problem, and point to point really isn't going to change that. You have four processors hitting the same bus either way.
With regards to FSB, remember it's also the reason why Intel processors have more cache. It's not a coincidence Intel processor have more cache, it's because AMD uses so much room on the processor for the memory controller. Intel decided they'd rather use the transistors for other things. I'm not speculating either, Intel has actually said this. Intel could have added a memory controller a long time ago, but they didn't. In fact, in the mid 1990s there was a company called NexGen (which AMD bought because they couldn't design a decent processor from scratch at the time, and had a lot of problems with the K5 that alienated companies like Compaq) which had an onboard memory controller with the NX586. Jerry Sanders decided to can it for the NX686 and use a standard Socket 7 platform instead of NexGen's proprietary one for what became the K6. The K6-III+ is a really interesting chip, you can actually change the multiplier on the fly without rebooting (I still use it for some servers, for exactly that reason).
quote: I'm not sure why you guys don't think an increase in FSB and memory bandwidth (i.e. 1600) isn't going to help. It's seems beyond obvious it will. Will it help enough is the only question
Certainly it will help...but keep this in mind (going towards your question at the end):
1. Both this review and the one Johan did show the old K8 clearly doing as well or better than C2D across the board already (with 4 cores or more)...and Johan's numbers were on an Opteron using very old PC2700 memory as well (Jason and Ross didn't list their memory type).
2. While Barcelona will be HT 2.0, it will be the last one at this speed...the rest of the K10s (the ones that Penryn will be competing with) will be HT 3.0. In other words, while the FSB of Penryn systems will be raised from 1333 to 1600, the K10s will be going from 1 GHz to between 1.8 and 2.6 GHz...
quote: With regards to the 2+2 from Intel, why does anyone really care?
Mainly because of the way it effects Intel's interchip communication. Remember that as apps become more parallel, they also require more communication between the cores. One of the great advances in C2D was the shared cache, the other was the Benseley platform (individual connections to the MCH). However, with an MCM quad core, the only path for one half of the chip to talk to the other half is through the FSB (MCH). In essence, you have 2 caches (each DC has a single cache, and there are 2 DC per CPU) per MCH connection, so we are back to a shared FSB again (in fact 2 shared FSBs). This recreates the bottleneck that the shared cache and Benseley were designed to get rid of...
quote: if one set of the cores is not capable at a certain speed, you just match it with one that is fairly close to it and sell it like that
Ummm...that's not how they manufacture their chips (and it would be outrageously expensive to do so!). The testing occurs after the cores have been placed on the chip...
quote: Why would AMD not have problems with Quad-Cores similar to Intel? You still have four cores sucking data through one memory bus, right? Or am I missing something?
Yes, you are...
Firstly is the interchip communication I spoke of. HT allows for direct connections between the caches of different chips, and the chips themselves have the cache directly connected on-die through a dedicated internal bus. That bus has 2 memory controllers connected directly to system memory as well as it's own dedicated HT connection (called cHT) to other caches. Remember that contrarily, Intel must route everything through the single MCH...
quote: It's not a coincidence Intel processor have more cache, it's because AMD uses so much room on the processor for the memory controller. Intel decided they'd rather use the transistors for other things
Actually, the reason Intel gave for not having an on-die memory controller is that memory standards change too quickly. But what they didn't say is that it takes many years (about 5 on average) to design and release a new chip, and an on-die memory controller is a major architectual change. That's why we don't see it on C2D, but we will see it on Nehalem...
Are you making this stuff up, or going by what Intel has said?
Intel has said that the reason they haven't gone up with the on-board memory controller, with respect to the Core 2, is because they preferred to use the silicon for the cache and other things. I think a lot of it is because they sell a lot of IGPs, and didn't want to use the awkward arrangement or either adding another memory controller outside of the processor, or having to use the processors memory controller since the IGP doesn't have it's own memory. The last part is speculation on my part, Intel said they preferred to use the transistors differently, and used cache as an example.
Your argument has now become comparitive, rather than absolute, going back to what I am saying about it helping enough. Also remember that the Penryn will have larger caches, which helps mediate this problem since you will have less contention. Both together should make a reasonably large impact on bandwidth restricted situations.
With regards to 2+2, actually, you're wrong on that. That's exactly what Intel said. They commented that they are able to run them at higher clock speeds than they could if they went native four, since they can test before they are all together rather than have to downbin, or throw away, a whole part if one of the dual cores is a failure or can't clock high. It's not speculation on my part.
Apps becoming more parallel is kind of a bad joke that people who are clueless talk about. Multithreading has been around since 1988 with OS/2, and back then I was writing them. Even for single processors, you did this, because good programmers wanted their application to always be responsive to the user even when you were doing things for them. Admittedly, Windows was quite a bit behind, but multithreading is nothing new, and there are limitations to it that simply can't be overcome. For some applications, it works great, for others you can't use it. Multiple cores are fairly new mainly because AMD and Intel can't think of anything better to do with the transistors, but multiprocessor computers are not, and people have been writing applications for them for many, many years (myself included). ILP applies to everything, TLP does not, and is essentially an admission from CPU makers that they are in a very, very diminishing returns situation with regards to transistors and performance.
With regards to the shared cache, you are also incorrect in saying it is why the Core 2 is so fast. It's a tradeoff, and you seem to ignore the L2 now has four more wait states because it is shared by two processors. I'm not sure how many more they'd have to add if it were shared among four cores, but it wouldn't be a free lunch.
Also keep in mind, theory sounds great, but where the rubber meets the road, the Clovertown does really well, and the main limitations have nothing to do with the trivialities of having a 2+2. In apps that can use it, the quad core shows a dramatic improvement over the dual. The FSB problems show up in these benchmarks rather vividly though, not a percentage or two that aren't that easily noticed.
I don't "make stuff up", mate...
"Intel does not integrate the memory controller. One reason is that memory standards change. Current Athlon computers, for instance, don't come with DDR II memory because the integrated memory controller connects to DDR I. Intel once tried to come out with a chip, Timna, that had an integrated memory controller that hooked up to Rambus. The flop of Rambus in the market led to the untimely demise of the chip" http://news.com.com/2061-10791_3-6047412.html">News.com story
While they also listed the large cache and space as a "reason", this was the reason they mentioned most often in interviews.
If by your insinuation you were questioning how long it takes to build a chip, I'm afraid that is just a result of many years of industry knowledge on my part (though if you ask anybody who works in the semi industry, they will confirm this for you).
Nehalem for example began it's design almost 6 years ago, and has been delayed because of necessary architectual changes (similar to the way Itanium was).
quote: Also remember that the Penryn will have larger caches, which helps mediate this problem since you will have less contention
Actually, the large cache doesn't help at all with the MCH bottleneck problem...in fact it makes it slightly worse. Remember that the data path for interchip communication is from cache to cache, not from system memory to cache. The larger cache (with the help of a good prefetcher) certainly helps reduce memory latency (though not as much as an on-die controller)...
quote: but multithreading is nothing new, and there are limitations to it that simply can't be overcome...For some applications, it works great, for others you can't use it. Multiple cores are fairly new mainly because AMD and Intel can't think of anything better to do with the transistors
Actually, multi-cores have been around for awhile...The Power4 was dual core back in 2000. What's new is that mainstream consumer level apps are being written for TLP because single cores are to be phased out...
quote: ILP applies to everything, TLP does not, and is essentially an admission from CPU makers that they are in a very, very diminishing returns situation with regards to transistors and performance
Not true...Intel tried to convert everything to ILP with Itanium and EPIC, but it was the market (and in many cases the software companies) that decided that it was too hard and too expensive for not enough gain. Most (if not all) software companies are now developing for greater TLP efficiency, as this allows a much smoother transition (evolutionary vs revolutionary).
Sure multithreading has been around for a long time, I used many programs on my old Amiga that were multi-threaded...but it's a matter of degree.
To use an anology, when I was a kid, the best TV set you could buy was a 6" black and white set...today I have a 50" plasma that displays native 1080P. The degree to which software is optimized for TLP is increasing every day.
quote: With regards to the shared cache, you are also incorrect in saying it is why the Core 2 is so fast
I said "one of the reasons"...
quote: theory sounds great, but where the rubber meets the road, the Clovertown does really well, and the main limitations have nothing to do with the trivialities of having a 2+2. In apps that can use it, the quad core shows a dramatic improvement over the dual
Actually, Clovertown is at the bottom when you're talking 4 cores...
For example, a 2P Woodcrest is significantly faster than a similarly clocked Clovertown, and they are essentially the same thing. The reason for this is that the 2 Woodcrest on the Clovertown must share the connection to the MCH while the 2x2P Woodcrest each have their own connection.
Actually, if you read the article, it says much more what I am saying. It talks mostly about cache, and in the interviews I have seen, that's what Intel touts. Even this article you present as proof shows the opposite, it mentions the memory changes, and then goes on and on about the extra cache and the performance of Core 2, not how quickly it can change with memory standards. Your whole premise is illogical, you are saying that with the Nehalem all the sudden memory changes will happen slower. That's plain wrong. I am saying with the Nehalem and 45 nm lithography, and the diminishing returns with adding more cache, it makes more sense for Intel to add the controller. Which is more logical to you?
The larger cache makes it unnecessary for the cores to use the FSB, thus removes a bottleneck and causes less collisions. This has always been the case with multiprocessor configurations. If we have a 2+2, and one set needs to access main memory, and the other can access it's cache, you'll have less collisions than if they both needed to access main memory through the FSB. With a larger cache, you'll have less reads to main memory from each set of cores, and thus less contention.
I disagree on your remarks about TLP becoming suddenly important. Have you already forgotten about hyperthreading? Also, as I mentioned, there were ALWAYS advantages to writing multithreaded apps, even with one processor. I gave you one example where you always want your application to respond to a user, even if to tell them that you are doing something in the background for them. Another reason is that it is a lot more efficient, yes, even with a single processor. Even with the mighty 286 (an amazing processor for its day) the processor spent way too much time waiting for the I/O subsystems and a multithreaded application kept the processor busy while one thread waited on the leisurely hard disk. Yes, most programmers are hackers (a term misused now to mean someone that does bad things with code, whereas it meant someone that just sucked and couldn't write elegant code and hacked his way through it with badly written rubbish), but they still knew to write multithreaded stuff before dual cores, particularly with multiprocessing configurations becoming much more common with the P6. I'm not saying you won't see more of an effort, but the way things are being spoken about in the press is the it just takes some effort and these multicores will become absolutely fantastic when the software catches it. It ain't so, it's way overblown and there are a lot of things that will never be multithreaded because they can't be, and others that only benefit somewhat from it. Others will do great with it, it all depends on the type of application. Not every algorithm can be multithreaded effectively, and anyone who tells you otherwise reads too much press and hasn't coded a day in his or her life.
Your remarks about the Itanium are so bad I'm surprised you made them. Are you really this uninformed, or arguing just to argue. I think the latter. The problems with Itanium have nothing to do with ILP, although that was one of Intel's goals with it. The problem is, it remained a goal and has not been realized yet. Are you implying that the Itanium 2 has higher single threaded performance than the Core 2? I hope not.
If it had say 30% higher integer performance per core, on a wide list of applications, you'd have a big point there. It doesn't. It trails, in fact. First of all, I wouldn't call the Itanium a failure, because it's still premature to and I don't like counting out anything that gains market share year after year (albeit at a lower than expected rate). However, to the extent it has failed to gain the anticipated acceptance has a lot to do with cost, failures to meet schedules on Intel's part, the weird VLIW instruction set that people tend to dislike even as much as x86, and the fact it didn't run mainstream software well. Compatibility is so important, and that's why arguable the worst instruction set (aside from Intel's 432) is still king. Motorola's 68K line was much more elegant. Alpha even ran NT and couldn't dethrone it. It's hard to move people from x86, nearly (or possibly) impossible), and if you think this is some indictment against ILP, you're not even with reality.
Six years to design a processor is absurd, and you should know better. If you want to screw around with numbers why not start around 1991 or so when Intel started work on the P6 and say the Nehalem took 17 years, since some of it will come from there. People love throwing around BS numbers like that because it sounds impressive, but you only need to look at how quickly AMD and Intel add technology to their products to see it doesn't take six years. Look at AMD copying SSE, and Intel copying x86-64. Products now are derivative of earlier generations anyway, so you can't go six years back. The Nehalem will build on the Merced, it's not a totally from scratch processor. The Pentium 4 was pretty close, and the Prescott was a massive overhaul of the processor (much more than the Athlon 64 was vis-a-vis the Athlon), and it didn't take them even close to six years.
quote: Your whole premise is illogical, you are saying that with the Nehalem all the sudden memory changes will happen slower
???...sigh...I never said anything of the sort. I can see that you are just trying to read into anything published or said just what you want it to say, so I'll stop there. Everyone else can just read the article (and the CC, the other articles Intel published on the subject, etc...). But your misunderstanding comes clear with the following:
quote: Six years to design a processor is absurd, and you should know better...
Just to pull from a Google at random (this one from http://en.wikipedia.org/wiki/CPU_design">Wikipedia)
"The design cost of a high-end CPU will be on the order of US $100 million. Since the design of such high-end chips nominally take about five years to complete, to stay competitive a company has to fund at least two of these large design teams to release products at the rate of 2.5 years per product generation"
It's my mistake really...I thought that since you used all of these buzz words, you actually knew the industry. I was wrong...
quote: Look at AMD copying SSE, and Intel copying x86-64. Products now are derivative of earlier generations anyway, so you can't go six years back
This is another misconcenception of the novice...
1. Things like x86-64 and SSE are published many years before they are built. For example, x86-64 was first published for the public in 2001 (and in fact AMD had started work on it in 1998/9) under the name LDT. In fact, it was released to the open Consortium as freely distributable in April of 2001. The first K8 chip wasn't released until 2003.
Likewise, Intel's Yamhill team began work on x86-64 in 2000/1, though they didn't admit it's existence until much later because they wanted to foster support for IA64. The first EM64T chip was released in Q1 2005...
2. Intel and AMD have a comprehensive cross-licensing deal for their patents, and the patents are filed well before development begins...so even before it becomes public, they each know what technology the other is working on many years before release.
There are so many inaccuracies and misunderstandings in your posts that I suggest the following:
1. Use the quote feature so that I can understand just what it is you're responding to. Several of your points have nothing to do with what I said...
2. Try actually posting a link now and then so that we can see that what you're saying isn't just something else you've misunderstood...
I think you have a problem connecting things you say with their logical foundations, and I'll help you with that defect.
You are said that Intel's main reason for not putting a memory controller on the chip was because changes in memory happen too quickly. Intel is putting a memory controller onchip for the Nehalem. Therefore, the logical conclusion is that this problem will not be as big of one with the Nehalem, since it no longer prevents Intel from doing it. You really didn't understand that? Why am I even arguing with you when you have such gaps in reasoning? I said it was mainly for the real estate savings, and that becomes less of a problem on 45nm since you have more transistors, so it's a logical premise, unlike yours.
It's kind of interesting that you read things, but don't really understand much. First of all, you said six years, now you're down to five. You also assume a completely new design, which isn't the case anymore. They are derivative from previous designs. How long do you think it took to do the original Alpha? Mind you, this is from brainstorming the requirements and what they wanted to do, designing the instruction set, etc... This is when superscalar was extremely unusual, superpipelining was unheard of, and a lot of the features on this processor were very new. Even then, it took less than five years. They have a good story on it from Byte magazine from August 1992.
If could remember anything, you'd know that AMD was against using SSE and was touting 3D Now! instead. Companies get patents, but they don't tell the whole story or for the purpose of designing a processor, any meaningful story. To make the transistor designs, you need to know specifics about how things will act under every situation and the necessary behavior. You are clueless if you think that's in the patents. You also need an actual processor to have so you can test. You wouldn't want to be AMD and implement just based on specs, because inevitably there would be incompatibilities.
You are also using your pretzel logic with regards to Yamhill. The processors had this logic in them way before they were released, and the design was done well before that. You really don't understand that? The only positive from this is you at least admit it's not six years, but is five. You'll slowly worm your way down to a realistic number, but five isn't so bad.
With regards to what I'm responding to, I could paste your stuff, but you have logical deficiencies. You are talking about multi-core, and can't make the connection to me saying multithreading has been going on forever. Even in 1992 (I got a nice batch of Byte Magazines off of eBay, and I am rereading a few of them), they were talking about how multiple cores were the future, in MIMD or SIMD configurations. How multithreading was going to take over the world, and how programmers were working on it, etc... It's funny, people are so clueless, and they just read articles and repeat them (hey, that's what I'm doing!).
My suggestion to you is to go back and get a nice batch of Byte magazines on eBay, and read them and really try to understand what they're saying, instead of being a parrot that repeats stuff you don't understand and try to sound impressive with it.
I'm done arguing with you, you're not informed enough to even interest me, and I won't even waste my time to read your responses.
quote: You are said that Intel's main reason for not putting a memory controller on the chip was because changes in memory happen too quickly
You see? That's why I asked you to actually quote (I really was being quite sincere, it will help you)...that's NOT what I said.
What I said was that this was the reason Intel gave publicly, but that the real reason was that redesign of an architecture takes years not months. This is why they couldn't fit it on to C2D but will be able to on Nehalem...
quote: First of all, you said six years, now you're down to five
I said Nehalem was six years and that the average was five (please go back and reread my posts...or maybe use quote?). I also said that the reason was that Nehalem was changed which is WHY it took 6 years.
quote: You also assume a completely new design, which isn't the case anymore. They are derivative from previous designs
They are all derivatives of a previous design...for example, the C2D is a derivative of the P3. Did you think that Intel was just twiddling it's thumbs? AMD had several years of advantage over the Netburst architecture...don't you think that they would have released the C2D many years earlier if they could have?
quote: AMD was against using SSE and was touting 3D Now!
They use both (even now), but of course they would have preferred just 3D Now (just as Intel would have preferred everyone using just IA64). What's your point?
quote: To make the transistor designs, you need to know specifics about how things will act under every situation and the necessary behavior...You also need an actual processor to have so you can test
Sigh...
1. You need to learn the difference between "transistor design" and microarchitectural design. Both take a long time, but they are entirely different things (transistor design is part of manufacturing).
2. There are certainly ways to test as the product is being developed. For example, AMD released an AMD64 http://www.theregister.co.uk/2000/10/14/amd_ships_...">simulator and debugger to the public in 2000...
3. Even before initial tape-out (this is the first complete mask set), many sets of hand tooled silicon are made to test the individual circuits. This is the reason it takes so long...Each team works on their own specific area, then when the chip is first taped out they work on the processor as a whole unit.
4. Patents are often what initiate parts of the design...but I fail to see your point.
quote: with regards to Yamhill. The processors had this logic in them way before they were released, and the design was done well before that
The first Intel processors to actually have the circuits in them (not activated) were the initial Prescotts. But saying the design was done is ludicrous...can you give a single reason why Intel included the circuits (and remember that it's expensive to add those transistors) without being able to use them other than the design not quite being finished??
quote: I could paste your stuff, but you have logical deficiencies
I see...so instead of actually responding to what I've said, you deem it illogical and make up what I said instead?
quote: I'm done arguing with you
Great idea...best one you've had. And my apologies to everyone for the length of the thread...
I meant to say the Nehalem will build on the Merom. If it built on the Merced, maybe it does take six years, and I'm thinking AMD would have a real good chance of gaining market share.
quote: I'm not sure why you guys don't think an increase in FSB and memory bandwidth (i.e. 1600) isn't going to help. It's seems beyond obvious it will. Will it help enough is the only question.
You really havent been following processors for the last 12-14 years have you ? It has been proven, time, and time again, that a faster FSB is paramount to anything else (aside from processor core speed), in performance. Faster FSB == faster CPU->L1->L2. Memory bandwidth not so much (this is only because nothing takes advantage of memory bandwidth currently, and to be honest, I am not sure anything can, as this point), but DEFINATELTY FSB. Since, I do not see a faster core speed in the near future, the only other option for faster processors, aside from 'smarter' branch prediction' HAS to be FSB.
Now, since I have spoken against you, I suppose I am a 'dolt', or a 'moron', right ?
Is English your first language? I keep reading your sub-literal drivel and I'm not even sure what you're saying. I think you're agreeing with me that FSB does make a difference, but your writing ability is so poor it's hard to tell.
Either way, you're a moron or dolt, or whatever you choose :P.
You can not read, and understand what I am writting, and I am the dolt or moron . . .
Interresting that . . . interresting indeed. I think what I will do, is just ignore whatever else you have to say, just like the majority of other readers seemingly have done.
This is for the authors. Sorry if I missed it, but do the power measurements
include chipset power? AMD processors include the memory controller as well,
right? Do the performance/watt take this into account?
I am guessing Pernyn will be different enough from Clovertown to make using vmotion (and many other enterprise features) impossible. It sucks enough that we already have two processor families in our Dell 2950's, and here comes one more.
I am all for progress, it just looks like this might be something VMware has to address at some point.
...the industry. As usual Intel's "glueblob" is another rushed-out-the-door, knee-jerk reaction to AMD supplying superior CPU products. AMD is really gonna hurt Intel with Barcelona and friends.
the two xeon sockets share a common fsb to memory and io bus, right?
perhaps you should have included a 1-socket xeon vs 2-socket opteron, just to see how they compare when the xeons aren't as starved for bandwidth... not necessarily a 775 xeon and mobo, i imagine the 771 systems you used now would run just fine with just one of the cpu-s.
sure, that would turn into a core 2 extreme quadcore vs amd 4x4, or their server equivalents running server benchmarks instead of games but i'm still curious about it :p
I believe (could be wrong - it might be a future chipset; can't say I'm up-to-date on the server chipsets these days) that the Xeons have a Dual Independent Bus configuration, so they do get double the bandwidth. The only truly fair way of comparing would be a quad core AMD chip against a quad core Intel chip, but we obviously have to wait on AMD there. It's certainly going to be an interesting matchup later this year.
Note that in 2008, Intel will use a quad bus topology similar to HyperTransport, at least on paper, so they are certainly aware of the bus bandwidth problems right now. I'm not sure FB-DIMMs are really helping matters either unless you use huge memory footprints. So FB-DIMMs can be good in the real world but bad for benchmarks that don't utilize all the available RAM.
FB-DIMMs are also un-godly expensive if you need to have 16+ GB in a 2U box. With the Opteron boxes, you tend to have many more DIMM slots, so you can use lower capacity DIMMs.
I thought my eyes were decieving me, so I had to go back and look at the charts. AMD CPUs are capable or more transactions per second ? Wow. Granted, AMD CPUs also seem to use more power, but they also seem to have a 'better' CPU usage curve.
I suppose most companies, and enterprises would probably opt for the intel, based on long term power savings, and probably have an Opteron machine or two, where performance was critical.
It is nice to know, that AMD still does something better than intel. Makes me feel better about buying an Opteron 1210 for my desktop, even if it isnt a socket F Opteron . . .
You mean that four top of the line AMD cpus were outperforming two second fastest Intel's CPUs?
Clovertown's performance is very impressive, since according to those results two top of the line 2.66GHz Clowertowns would match performance of four 2.8GHz Opteron.
If you scroll up a few posts in this thread, you'll see the quote and link...
"...Two 2.4GHz Opteron 880 processors are as fast as one Xeon 5345, but four Opterons outperform the dual quad core Xeon by 16%..."
Ah, right. I think that's part of what Ross was talking about when he discusses the difficulties in coming up with appropriates tests for these systems. The Forum and Dell Store benchmarks had some serious issues, likely related to optimizations and I/O activity. There are instances where Intel does better, and of course others where they do worse.
quote: Ah, right. I think that's part of what Ross was talking about when he discusses the difficulties in coming up with appropriates tests for these systems
In general I've been finding that at 4 cores, K8 and Clovertown run about the same...anything over that goes to AMD. Of course (as Ross points out) a lot of this assumes that the software can actually scale to use the 4 or more cores. For example, MySQL doesn't appear to scale at all...
We can be fairly certain that Barcelona will easily beat out any of the quad core Intel chips...I say this because based on the tests that you and Johan have done, even if Barcelona used old K8 cores they should beat them. However, things will not stand still for long on this front...
1. Barcelona is a transitional chip which won't be on the market for long. The "+" socketed K10s start coming out the following quarter with HT3, and the added bandwidth should be a nice boost.
2. Penryn comes out almost immediately afterwards, and with a 1600 FSB and a much higher clockspeed, it might be able to catch up to the K10s (I think a lot will be determined by what clockspeeds AMD is able to get out of the K10 at 65nm).
3. The most interesting (and closest) area will be the dual cores (where most of us will be living). Because the FSB bottleneck is nowhere near as bad at dual core level, I suspect that Penryn and K10 will be absolutely neck and neck here. This is where we will absolutely need to see benches...I don't think anyone can predict what will happen in the desktop area until Q3 (and the benchmarks) comes around.
As to the power section of the review, you guys did a fine job based on what you had to work with. Certainly it has nothing to do with Barcelona (as you say), and my guess is that you guys are absolutely salivating to get a Barcy for just that reason (I know I can't wait for you to get one!).
The power section is (IMHO) going to be a main event on that chip...I can't wait to see how well the split plane power effects the numbers during benchmarks!
I would like to put in my vote now...when you get your Barcy, could you do a review that encompasses power for real-world server applications? By that I mean could we see what the power draw is during normal use as well as peak and idle...?
I also was little confused to see AMD outperforming the Intel counterparts but then I asked myself how far the gap will be when K10 opteron comes out. And then just imagine one more time having 2xquad K10 in a 4x4 setup...Godly power?
Remember that the difference is four sockets vs. two sockets. AMD basically gets more bandwidth for every socket (NUMA), so that's why it's not apples to apples. Four dual core Opterons in four sockets is indeed faster in many business benchmarks than two Clovertowns in two sockets. Also remember that we are testing with 2.33 GHz Clovertown and not the 2.66 or 3.0 GHz models, which would easily close the performance gap in the case of the latter.
Don't forget that four Opteron 8220 chips cost substantially more than two Xeon 5355 chips. $1600 x 4 vs. $1200 x 2. Then again, price differences of a few thousand dollars aren't really a huge deal when we're talking about powerful servers. $25000 vs. $27000? Does it really matter if one is 20% faster?
One final point is that I've heard Opteron does substantially better in virtualized environments. Running 32 virtual servers on an 8-way Opteron box will apparently easily outperform 8-way Xeon Clovertown. But that's just hearsay - I haven't seen any specific benches outside of some AMD slides.
quote: Don't forget that four Opteron 8220 chips cost substantially more than two Xeon 5355 chips. $1600 x 4 vs. $1200 x 2. Then again, price differences of a few thousand dollars aren't really a huge deal when we're talking about powerful servers. $25000 vs. $27000? Does it really matter if one is 20% faster?
Yes, and no. You having worked in a data center, you know these types of system are often specialized for certain situations. That is why I said, majority Intel, and a few high performance AMD. I really dont know what these people are getting riled up about . . .
quote: One final point is that I've heard Opteron does substantially better in virtualized environments. Running 32 virtual servers on an 8-way Opteron box will apparently easily outperform 8-way Xeon Clovertown. But that's just hearsay - I haven't seen any specific benches outside of some AMD slides.
I follow virtualization fairly close, but I do not examine every_single_aspect. However, I can tell you right now, that AMD at minimum does hold the advantage here, because their CPUs do not require a BIOS that is compatable with said technology, Intel CPUs currently do. As for the performance advantage, this could have to do with the Intel systems having to have their BIOS act as middle man. Also last I read, fb-dimms were slower than DDR2 dimms, so perhaps this also plays a factor ? Another thing, how many Intel boards out there support 4-8 procesors ? The reason I ask, is that I havent seen any recently, and this could also play a factor *shrug*
It makes one wonder why the processors were compared in the first place. Did you guys throw processors in a hat and then pull them out and decide to benchmark them against each other? Why not throw a Tualatin in there just for kicks?
OK, all sarcasm aside, does anyone actually think about these articles before they are written? It's not OK to put a disclaimer in to say you're making an unfair comparison, and then make it. I know it seems it is, but a lot of people don't read it, and there's an assumption, however false, that the article will be written with some common sense in it. A lot of people will see the charts, and that's what they'll base their reaction on. Is it their fault, to some extent, but still, you know they're going to do it, and they're going to get the false impression, and it's thus your fault for spreading misinformation (it amounts to this, even though it is qualified).
If you compared on cost, that would be one thing. If you compared in market segment, still fair, if you compared the only quad core to the only quad core, I wouldn't like it, but I'd still think it was at least supportable. But, when you compared a high end 8-way Opteron with a low end Clovertown, you get these reactions from people where they see AMD beating Intel and they consider it significant. Look at your responses, there is no doubt of this.
I'm not saying Opterons are vastly inferior in every situation, or I should say Opteron based systems, only that this article gives a false impression of that because of how people react to articles. They don't read every little thing, they don't even remember it, and they often walk away with a false impression because of poor choices on the part of the reviewers. People are people, you need to deal with that, however annoying it can be. But, even then, the choices were remarkably poor in that they tell a lot less than one based on closer competitors would have. The best Clovertown versus the best Opteron. The same power envelope. The same cost system. All are better choices.
I agree with your response, but the problem is, the charts speak a lot louder than responses, and disclaimers. That's why you put charts in, after all, to get attention to convey ideas effectively. When you do this with improper comparisons, I think you can see the inherent illogic in that while at the same time defending it by talking about disclaimers and such. Again, look at your responses here.
I also think the FB-DIMMS have a lot to do with Intel's relatively poor performance, and I don't think this was emphasized enough. It would be interesting to try to isolate how much of a performance penalty they have, although I don't know if this could be done precisely. Intel seems intent on using them more and more, and I fear they are heading into another RDRAM situation, where it may be a very good technology, but they introduce it too soon where it shows disadvantages and people get a negative impression on it. Obviously, they aren't pushing it the way they did RDRAM, but it seems to come with a much greater performance penalty (the 840 actually performed as well as the 440BX, overall, although the single channel 820 was kind of poor) and the cost is pretty high too, although probably not as bad as RDRAM was.
One last tidbit of information about virtualization, since it's mentioned in the article. It's kind of ironic that such a poor selling machine had so much advanced technology in it, but the IBM RT PC not only paved the way for RISC based processors, but also had virtualization even back in 1986. AIX ran on top of the VRM (Virtual Resource Manager). The VRM was a complete real-time operating system with virtual memory management and I/O subsystem. With it, you could run any OS on top of it, (in practice, it was only AIX), and in fact several at the same time. In fact, it went even further with the VMI, which had a well-defined interface for things like I/O control, processor allocation functions, etc... I'm not sure what my point is, except that most of the "new" stuff today isn't new at all. Intel was talking about multicores in the early 1990s, in fact. I guess the trace-cache and double pumped ALUs were new, but their end product didn't seem to work that great :P.
First off, thanks for the feedback. We spent some time considering what to compare the Clovertown to, and ultimately made the decision to compare based on core count. Was it the right decision? We *think* so, but would have rather compared it to an equivalent part. Is it unfair? Yes. Do people skim read and make comments without having read the article? Sure. Would people have freaked out if we compared a Clovertown to a 4-way socket-f configuration? Absolutely
The decision becomes one based on levels of "unfair", either decision would have been unfair which makes it pretty darn difficult to choose. It's a shame people don't read before commenting, although aren't just about all facets of life full of this? Your comment about comparing cost is a good one, although do you really thing given that people don't read power consumption numbers that they'd read a cost based graph? (Doubtful).
The end game is that Intel made the right decision, and Clovertown is great product because of that. We are as anxious as everyone else to see what happens with K8L, and then Penryn.
I think the central premise of your remark is that it's not possible to choose completely equal setups, in this instance, and someone would cry foul regardless of your choices because it was not possible to make such firm selections. I am going to proceed with my response based on this premise, and I apologize in advance if I am misundertanding you.
I do agree with what you're saying, but on the other hand I think you could have made it much closer than it was. I don't agree you minimized the "unfair" factor as well as you could. In fact, the Opteron cost more, ran at much higher clock speeds, and used more power. I'm not even going to complain about FB-DIMMs, or the FSB limitations Intel systems have, because they are inherent to that design and I think are completely legitimate. The benefits of using a memory controller on the chipset are obvious enough in certain configurations (it's kind of odd no one ever brings them up, and simply says the FSB is purely bad, but did anyone ever notice how much bigger the caches on Intel's processors? Hmmmm, maybe the saved space from not having a memory controller on board? How about video cards accessing memory? Do you really want to make them use the memory controller on the CPU, or add redundant logic to do it without it?). I'm not saying the memory controller on the chipset is better, overall, just that it has advantages that are almost never brought up. However, less and less as lithographies shrink and the size of the memory controller becomes less significant.
OK, back from the digression. I'm saying you should have compared something like a 2.66 GHz Clovertown to a 2.8 GHz Opteron setup. Or taken a lower Opteron to compared with a 2.33 GHz Clovertown. You should stick with the same segment. To put it another way. You might have people that have "x" dollars to spend on a server. So you'd make a valid comparison based on price. It won't work for everyone, but it will for that segment and the others can at least see the logic in it. Or, how about people that have a power bracket they have to go under. The same would apply to them, it would just be a different group (or some would fall into both). Or how about the guy that wants the fastest possible system and has a devil may care attitude towards energy, noise levels, and cost. Your comparison didn't relate well to any group I am aware of. As I mentioned, the Opteron uses more power, is more expensive, while the Clovertown does not represent the best Intel has for that segment.
So, I'm not talking about adding another chart that says something about being cost based. I'm saying compare valid processors in the first place, based on something that will be useful to a segment, as aforementioned, and created a whole bunch of useful charts instead of creating less useful ones and adding a chart at the end to somehow illustrate why it is less useful. I agree, most people won't read it, or even pay much mind to it if they did. That's why I think it's more important to make an intrinsic change to the review, rather than compare unequal processors and show how they are.
I'll try to preempt your most obvious response by saying I realize true equality is impossible in something like this. However, I think we can both agree that you could have gotten something closer than what was done. A lot closer, really.
This is something that almost always comes up in our IT server type reviews. There are numerous facets of this that people never seem to take into consideration. For example, given the cost, there is absolutely no way that we are going to go out and purchase any of this hardware. That means we're basically dependent upon our industry contacts to provide us with appropriate hardware.
Could we push harder for Intel to ship us different processors? Perhaps, but the simple fact of the matter is that Intel shipped us their 2.33 GHz Clovertown processors and AMD shipped us their 2.8 GHz Opteron processors. Intel also shipped us some of their lower clocked parts, which surprisingly didn't use much less power. Clovertown obviously launched quite a while ago, and Ross has been working on this article for some time, trying to come up with a set of reasonable benchmarks. Should we delay things further in order to try and get additional processors, or should we go with what we have?
That's the next problem: trying to figure out how to properly benchmark such systems. FB-DIMMs have some advantages over other potential memory configurations, particularly for enterprise situations where massive amounts of RAM are needed. We could almost certainly, with benchmarks that show Opteron being significantly faster, or go the other way and show Intel being significantly faster -- it's all dependent upon the benchmark(s) we choose to run. I would assume that Ross took quite a bit of time in trying to come up with some representative benchmarks, but no benchmark is perfect for all situations.
Most of the remaining stuff you mention has been dealt with in previous articles at some point. Continuously repeating the advantages and disadvantages of each platform is redundant, which is why we always went back to previous/related articles. We've talked about the penalties associated with using FB-DIMMs, we talked about overall bus bandwidth, but in the end we're stuck with what is currently available and speculating on how things might be improved by a different architecture is simply that: speculation.
The final point that people always seem to miss is that price really isn't a factor in high-end server setups like this, at least to a point. In many instances, neither is power consumption. Let's take power as an example:
In this particular testing, the quad Opteron system generally maxed out at around 500W while the dual Clovertown system maxed out at around 350W. 150 W is certainly significant, but in the big scheme of things the power cost is not what's important. Let's just say that the company pays $.10 per kWHr, which is reasonable (and probably a bit high). Running 24/7, the total power cost differential in a year's time is a whopping $131.49. If the system is significantly faster, know IT department is really going to care. What they really care about, in some instances, this power density -- performance per watt. A lot of data centers have the maximum amount of power that they can supply (without costly upgrades to the power circuitry), so if they need a large number of servers for something like a supercomputer, they will often want the best performance per watt.
Going back to price, that can be a very important factor in small to medium business situations. Once you start looking at octal core servers, or even larger configurations, typically prices scale exponentially. Dual socket servers are cheap, and single socket servers are now basically high-end workstations with a few tweaks. The jump from two sockets to four sockets is quite drastic in terms of price, so unless you truly need a lot of power in a single box many companies would end up spending less money if they purchased two dual socket servers instead of one quad socket server. (Unless of course the IT department leads them astray because they want to play with more expensive hardware....) So basically, you buy a $20,000 or more setup because you really need the performance.
As I mentioned above, looking at the price of these two configurations that we tested, the quad Opteron currently would cost a couple thousand dollars more. If the applications that you run perform significantly better with that configuration, does it really matter that quad Opteron is a bit more expensive? On the other hand, as I just finished explaining, a large cluster might very well prefer slightly slower performance per server and simply choose to run more servers (performance per watt).
While I am not the author of this article, I take it as a given that most of our articles are necessarily limited in scope. I would never consider anything that we publish to be the absolute definitive end word on any argument. In the world of enterprise servers, I consider the scope of our articles to be even more limited, simply because there are so many different things that businesses might want to do with their servers. The reality is that most businesses have to spend months devising their own testing scenarios in order to determine what option is the best for upgrades. That or they just ask IBM, Dell, HP, etc. and take whatever the vendor recommends. :|
Your remarks about power are off. I'm not sure if you guys are really understanding what I'm talking about, or are just arguing just to argue. People see the performance charts, and assume you guys did a decent job of picking appropriate processors without reading the fine print. You didn't, you did a poor job of it, and people often times miss that because they didn't take the time to read it. A lot of remarks are about that. So, many people will read your performance charts and assume they are reasonable comparable parts, when they are not. Taking almost 50% more power isn't a reasonable comparison, sorry. It's a terrible one.
I'm not at all sure what you're talking about when you bring up the memory and the benchmarks. I had no complaints against them, and you are just stating the obvious when you can choose benchmarks that would make each processor look better. Benchmarks, taken alone, are always dangerous. People love things simplified so very much cling to them, but they never tell the whole story. So, I agree, but I have no idea why you bring it up.
With regards to bringing stuff up that you have already, are you saying you've pointed out the advantages of the memory controller on the chipset? I don't see that stuff brought up much at all, and it was unrelated to this article. As I said, I digressed, but the impression I get from the hobbyist sites like this is that they all say the integrated memory controller is so much better and the FSB is perfectly horrible and nothing but a bad thing. It's simply not true, integrated memory controllers have been around a long time, and I almost laugh when I see idiots saying how Intel will be copying AMD in doing this. Like AMD was the first to think of it. Or the point to point stuff. It's so uninformed, it's both comical and annoying. Intel knows all about this, as does every company making designs, and they each make different tradeoffs. Was it a mistake for Intel to wait as long? I would say no, since the P8 walks away from the K8, and Intel obviously found really good uses for their transistor budget rather than put a memory controller on the chip. It's not like they don't know how, they obviously chose not to until the P9. One oddity with Intel chips is the odd number ones almost always suck. 186 anyone? 386 sucked bad, it wasn't any better the 286 clock normalized, and it's claim to fame was running real mode elegantly. Pentium wasn't that bad, but still wasn't great and got super hot, and most of the performance was from making the memory bandwidth four times greater. Pentium 4? Oh my. What were they thinking???? P9 has a long, and bad, history :P.
The FB-DIMMs have been spoken about, that isn't what I was commenting on. I do think a lot of people are confused, even just comparing one dual core processor, how close the Opteron comes to the P8 based Xeons when they are used to seeing a much greater difference in the desktop parts. It's not just the benchmarks, the FB-DIMMs have serious performance handicaps. I don't think it's mentioned enough, although I agree a full description of it would be superfluous and unnecessary.
My remarks about price were more in line with making an apples to apples comparison, where you compare things at the processor level so you could see comparitive products. Price always matters, always, and always will, even at an end user level. I agree, the cost for four sockets is way too high for most applications, and thus they sell relatively poorly compared to dual socket systems. It's like comparing a Chevrolet Cavalier to a Dodge Viper and comparing them on a multiple of tests, and then saying how the Viper costs more, and that's the nature of sports cars like that, so we should be OK with it. Bring out the Corvette so we can see a real comparison between the two companies, not lame excuses as to why you chose the wrong processors and how it really doesn't matter. They are just rationalizations, and no one will believe them. Cost does matter. Always has, always will. Why do you think IBM doesn't dominate the PC market anymore. They made great computers, but, they cost those extra few hundred dollars more. Over 1000s of machines, it adds up. And even if it didn't, why would you compare the 2.66 with the 2.8 Opteron, which would have closer cost, and would actually illustrate what was available for those people that really needed that performance! You talk about how they need performance, and don't care about money, and then have a processor that is cheaper and doesn't have as good performance anyway! No contradiction there, huh?
OK, now I love to argue, so I am putting this last. You should have ended your argument with your accessibility to products, and I would have had nothing to say and that would be that. You are right, you can't buy it and Intel should have sent you better parts, and I would guess you actually did try to get the 2.66 Clovertown. No argument there. It's just the stuff after it that made no sense. But, thanks for the explanation, despite all the arguments I think are invalid, you did make one that I can completely understand and makes sense. Really, Intel is kind of stupid for not sending the better parts and improving their image, and if you can't get them, you can't get them and shouldn't pay for them.
I worked in an IT department for three years that bought extremely expensive servers that were often completely unnecessary. They just purchased the latest high-end Dell servers and figured that would be great. That's still what a lot of large enterprise businesses do. You mention desktops and IBM... and now who's bringing up irrelevant stuff? This article is only about servers, really, and a limited amount of detail at that.
I also have no idea how you say my power remarks are "off". In what way? Cost? Importance? The calculations are accurate for the factors I stated. If a systems is slower and cheaper and uses more power, it will take ages to overcome the power cost in this particular instance. 150W more on a desktop is a huge deal, when you're talking about systems that probably cost $1500 at most. 150W more on a $20000 server only matters if you are near your power capacity. The company I worked for (at that location) had a power bill of about $50,000 per month. Think they care about $150 per server per year? Well, they only had about a dozen servers, so of course not. A shop that has thousands of servers would care more, but if each server that used less power costs $3000 more, and they only used servers for three years, again it would only be a critical factor if they were nearing their peak power intake.
In the end it seems like you don't like our graphs because they don't convey the whole story. If people don't read graphs properly and don't have the correct background, they will not draw the appropriate conclusions. End of line. When we specifically mention 2.66 and 3.0 GHz parts, even though these weren't available for testing, that's about as much as we can do right now. If I were writing this article, I'm certain I would have said a lot more, but as the managing editor I felt the basic information conveyed was enough.
The fact of the matter is that while Intel is now clearly ahead on the desktop, on servers it's not a cut-and-dried situation. AMD still has areas where they do well, but there are also areas where Intel does well. The two platforms are also wildly different at present, so any comparisons are going to end up with faults, short of testing about 50 configurations. I defend the article because people attack it, and I don't think they're looking at the big picture. Is the article flawed? In some cases, yes. Does it still convey useful information? I certainly think so, and that's why it was published.
We have no direct financial ties to either company (or we have ties to both companies if you want to look at it from a different perspective), and every reason to avoid skewing results (i.e. sites like us live and die by reputation). Almost every server comparison I've seen in recent years ends up with some of the faults you've pointed out, simply because the hardware is not something review sites can acquire on their own. They either differ on cost, performance segment, features, or some other area. It's the nature of the business.
Your power remarks are off because you ignore the problems heat creates besides the electricity. Working for one company for three years isn't a huge sample set, you know. I have worked for a lot of big companies, and they all are a little different. With regards to heat, you have to evacuate that heat too. Air conditioning and raised floors can get expensive. Reliability is also impacted by heat. It's not a simple dollar amount based on what the processors use. Also, your argument is deceptive at best and meant to confuse. Did you choose these processor based on power/performance ratio? You picked the best from AMD and Intel in this regard and did the comparison? No, you didn't. You chose what you had.
A lot of times dumb people from IT buy really expensive stuff on purpose, so they show they needed the entire IT budget that year and waste it so it isn't made less the subsequent year. I have been part of that, so what you're saying doesn't surprise me. In fact, I have been part of situations where we bought stuff just to buy it, so we'd use up the entire allocation. I have also been in situations where every last dollar is scrutinized. My remarks about IBM were to illustrate that cost matters. Not to everyone, and sometimes in a reverse logic way like I mentioned above, but for most people it does. If it didn't, why would AMD even bother selling mid range eight way Opterons since everyone would want the best ones? Hmmmmm, it's not the power usuage is much lower, so what is it? I guess it's cost. Why are there more than one Itanium part from Intel when they are used almost exclusively in very expensive machines? Hmmmm, cost again? I think Intel and AMD have a pretty good grip on that, and they do sell multiple tier parts for even expensive segments. So, it does matter, although admittedly in individual cases in perverse ways.
I don't like your graphs because they don't convey any story correctly. I don't like that you compared wrong parts. I wouldn't expect graphs to tell the whole story, they can't, but I expect the part they do tell to make some sense. But, again, now I understand it was availability and I don't have a problem with it. Before, I was aghast at your seemingly horrible choice for the comparison. I'm not anymore.
The article probably does convey some useful information, my point was it could have conveyed a lot more with better choices. I don't have a problem with AMD being better at certain things, I knew that and accept that. I like it, in fact. I don't expect much from the P9 either, since Intel (strangely like Motorola with their 68K line) seems to have lousy odd number parts, and from what I'm reading it doesn't seem very good. I hope I'm wrong, but I have a bad feeling about it.
I never said you had any ties to either company, or even implied that you were intentionally favoring one over the other. Don't read anything into what I say, I say what I mean and if I thought you were intentionally screwing Intel, I'd have said so. I just didn't buy the nonsense about how good the choices were when I knew they sucked, and you did too. You should have just been upfront about why, and I think everyone would have understood. It's certainly better than nothing, and those were you two choices. I can't argue with that. Well, I could, but it would be specious at best :P.
I'd like to point out that I read the whole article, and knew the differences. The simple matter of the fact is that what anandtech was supplied with is what was tested. End of line. Period. Did Intel not send its higher performing parts, because they do not perform as well as the Opterons also (unlikely, but possible) ? Who knows. Unless you have the parts yourself to test, you will never know.
What next ? "you didnt explore the fine raindeer effect" ? Please . . .
About the only people that would really get upset by this type of article are fanboys or employees of Intel and/or AMD. In this case it would probably only be Intel simply because this is a test scenario where Intel isn't substantially ahead. In fact, AMD is probably quite pleased to see some more "fair" results posted. I wouldn't argue much more, Jarred, without this guy specifically answering the following question.
Are you employed by Intel, Mr. TA152H, and is that why you're upset? Wasn't there a law passed recently about disclosure for such things being required? I remember http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">another article where your posts smacked of guerilla marketing, and this is the case again. You're upset that AT didn't tell the story Intel would like to see. At least that's how I see it.
I actually have preferred AMD to Intel most of the time, and was instrumental in bringing them into our company when the Athlon came out, because we were doing computational fluid dynamics and the x87 on it was better.
I have had a lot of dealings with both companies, and actually like them both. I don't want Intel to dominate the way they did, but I don't like articles that are misleading either. I don't have any problem with people saying AMD is better, when they are, but this type of article is bad. I can understand the problems acquiring parts, and they might have said that in the first place and I would have agreed. I don't like the false arguments, which are wrong and misleading, that rationalize the real reason. They should have just told the truth, this is what we got, this is what we tested, instead of trying to hide it and say they chose these because they were the best choices for the comparison. They weren't, that was my point, and I won't back off of it.
quote: AMD basically gets more bandwidth for every socket (NUMA), so that's why it's not apples to apples
True, but from what we've seen when dual core Opterons came out, putting the cores on-die is a greater advantage than the bandwidth addition from the sockets.
By this I mean that the 1P/DC Opterons consistently beat out the 2P/SC Opterons on most everything (except bandwidth of course)...though the difference was only 5% or so as I recall.
"Here is a first indication that quad core Xeon does not scale as well as the other systems. Two 2.4GHz Opteron 880 processors are as fast as one Xeon 5345, but four Opterons outperform the dual quad core Xeon by 16%. In other words, the quad Opteron system scales 31% better than the Xeon system"
Thanks for clearing that up for me because I thought it was a simple 2 socket quad core setup. So does this mean you will be able to have 4xk10 quad possible in the future? That is 16-way Opteron box and will Intel have anything to compete with that?
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
56 Comments
Back to Article
timelag - Wednesday, April 4, 2007 - link
Authors--Er, gosh. Dunno what to make of the preceding discussion. Eh, they don't scare me--I'll post anyway.
Even though the title of this article is "Quad core for the masses", the benchmark is for enterprise database applications. Because of the title, I had expected some workstation benchmarking. Any plans for doing benchmarks for scientific and visualization applications? From bio-tech (BLAST, etc.), to fluid dynamics, to 3D rendering. That sort of thing.
Viditor - Wednesday, April 4, 2007 - link
Didn't mean to put you off there timelag...:)My apologies...
Some of what you're asking for was done in a http://www.anandtech.com/showdoc.aspx?i=2897&p...">previous article by Johan
Beenthere - Saturday, March 31, 2007 - link
Intel's attempt to use two dual cores on a slice of silicon and call it a quad core shows how easily they can manipulate the media with foolishness. Only a fool would buy Intel's inferior 2+2 design when they can have Barcelona and it's many superior derivatives.JarredWalton - Saturday, March 31, 2007 - link
Riiight... only a fool would get QX6700 right now when Barcelona isn't out. Two chips in a package has disadvantages, but there are certainly instances where it will easily outperform the 2x2 Opteron, even in eight-way configurations. There are applications that are not entirely I/O bound, or bandwidth bound. When it comes down to the CPU cores, Core 2 is significantly faster than any Opteron right now.As an example, a 2.66 GHz Clovertown (let alone a 3.0 GHz Xeon) as part of a 3D rendering farm is going to be a lot better than two 2.8 GHz (or 3.0 GHz...) Opteron parts. Two Xeon 5355 will also be better than four Opteron 8220 in that specific instance, I'm quite sure. The reason is the 4MB per chip L2 is generally enough for 3D rendering. There are certainly other applications where this is the case, but whether they occur more than the other way (i.e. 4x2 Opteron being faster than 2x4 Xeon) I couldn't say.
AMD isn't really going to have a huge advantage because of native quad core with Barcelona, and Intel wouldn't get a huge boost by having native quad core either. If you thought about it more, you would realize that the real reason Intel's quad core chips have issues with some applications is that all four cores are pulling data over a single FSB connection - one connection per socket. Intel has to use that single FSB link for RAM, Northbridge, and inter-CPU communications.
In contrast AMD's "native quad core" will have to have all four cores go over the same link for RAM access (potential bottleneck). They can use another HT link to talk to another socket (actually two links), and they can use the third HT link to talk to the Northbridge. The inter-CPU communication generally isn't a big deal, and Northbridge I/O is also a much smaller piece of the bandwidth pie than RAM accesses. It's just that AMD gets all the RAM bandwidth possible. AMD could have done a "two die in one package" design and likely had better scaling than Intel, but they chose not to.
And of course Intel will be going to something similar to HyperTransport with Nehalem in 2008. Even they recognize that the single FSB solution is getting to be severely inadequate for many applications.
Viditor - Saturday, March 31, 2007 - link
Actually, that's not true Jarred.
http://www.anandtech.com/showdoc.aspx?i=2897&p...">Johan's test benchmarked exactly that scenario, and C2D was equal at 4 cores and slightly slower at 8 cores. This was a 2.33 GHz Clovertown vs the 2.4 GHz Opterons...
Viditor - Saturday, March 31, 2007 - link
Let me add that there are cases where it could be true, but only when the apps don't scale at all...and in that case, even a single or dual core sometimes beats the Clovertowns.JarredWalton - Sunday, April 1, 2007 - link
Okay, wrong example then. Heh. The point is I am sure there are benchmarks where the FSB bottleneck isn't as pronounced. Anything that can stay mostly within the CPU cache will be very happy with the current Xeon 53xx chips. Obviously, the decision as to what is important will be the deciding factor, so companies should research their application needs first and foremost.Getting back to the main point of the whole article, clearly there are areas where Opteron can outperform Xeon with an equal number of cores. Frankly, I doubt 1600 FSB is going to really help, hence the need for the new high speed link with Nehalem on the part of Intel. K10 could very well end out substantially ahead in dual and quad socket configurations later this year, even if it only runs at 2.3 GHz. I guess we'll have to wait and see... for all we know, the current memory interface on AMD might not actually be able to manage feeding quad cores any better than Intel's FSB does.
Viditor - Sunday, April 1, 2007 - link
Actually, it appears (at least from the stuff I've seen so far) that the only apps that aren't effected by the bottleneck are the ones that are just as good on a dual core...in other words they don't scale well.
I agree with the AMD exec who intimated that AMD made a HUGE mistake in not coming out with an MCM quad chip in November...I think that the benches would have been nicely into the Opteron side of things well before Barcelona, but of course only on the quad chip.
I absolutely agree...I've been saying for the last year that AMD will most likely retake the lead again (even against Penryn), but that Nehalem is a whole nother ballgame...
I suppose that's possible, but if it were true then I think every executive at AMD would have dumped all of thier shares by now. :)
That's just as valid as saying it's possible that there's a flaw in Penryn when it gets over 2.8 GHz...possible, but I strongly doubt it.
TA152H - Monday, April 2, 2007 - link
I'm not sure why you guys don't think an increase in FSB and memory bandwidth (i.e. 1600) isn't going to help. It's seems beyond obvious it will. Will it help enough is the only question.With regards to the 2+2 from Intel, why does anyone really care? In some ways it's better than a true four in that you can clock them higher because you can pick pairs that make the grade, instead of hoping that all four core can clock really high. If one of the four can't, well, the whole thing has to be degraded. With Intel's approach, if one set of the cores is not capable at a certain speed, you just match it with one that is fairly close to it and sell it like that. It allows them to clock higher, and sell them less expensively than they would if they made a big quad-core die. The performance is excellent too, so it's a pretty good solution.
Why would AMD not have problems with Quad-Cores similar to Intel? You still have four cores sucking data through one memory bus, right? Or am I missing something? Is AMD going to have a memory bus for each core? That seems strange to me, so I'm going to assume they are not. The memory controller and point to point bus don't fundamentally change that problem. This comparison was fairly grotesque in that it made the memory subsystem for the Opteron seem better than it was. You had eight processors, yes, but only two cores since you had four sockets and only two cores were fighting for the same bus since you had a point to point. That's the advantage. If you have more sockets, the AMD solution will scale better, although NUMA has horrible penalties when you leave the processors own memory. If you add more processors to the same socket, you still have fundamentally the same problem, and point to point really isn't going to change that. You have four processors hitting the same bus either way.
With regards to FSB, remember it's also the reason why Intel processors have more cache. It's not a coincidence Intel processor have more cache, it's because AMD uses so much room on the processor for the memory controller. Intel decided they'd rather use the transistors for other things. I'm not speculating either, Intel has actually said this. Intel could have added a memory controller a long time ago, but they didn't. In fact, in the mid 1990s there was a company called NexGen (which AMD bought because they couldn't design a decent processor from scratch at the time, and had a lot of problems with the K5 that alienated companies like Compaq) which had an onboard memory controller with the NX586. Jerry Sanders decided to can it for the NX686 and use a standard Socket 7 platform instead of NexGen's proprietary one for what became the K6. The K6-III+ is a really interesting chip, you can actually change the multiplier on the fly without rebooting (I still use it for some servers, for exactly that reason).
Viditor - Monday, April 2, 2007 - link
Certainly it will help...but keep this in mind (going towards your question at the end):
1. Both this review and the one Johan did show the old K8 clearly doing as well or better than C2D across the board already (with 4 cores or more)...and Johan's numbers were on an Opteron using very old PC2700 memory as well (Jason and Ross didn't list their memory type).
2. While Barcelona will be HT 2.0, it will be the last one at this speed...the rest of the K10s (the ones that Penryn will be competing with) will be HT 3.0. In other words, while the FSB of Penryn systems will be raised from 1333 to 1600, the K10s will be going from 1 GHz to between 1.8 and 2.6 GHz...
Mainly because of the way it effects Intel's interchip communication. Remember that as apps become more parallel, they also require more communication between the cores. One of the great advances in C2D was the shared cache, the other was the Benseley platform (individual connections to the MCH). However, with an MCM quad core, the only path for one half of the chip to talk to the other half is through the FSB (MCH). In essence, you have 2 caches (each DC has a single cache, and there are 2 DC per CPU) per MCH connection, so we are back to a shared FSB again (in fact 2 shared FSBs). This recreates the bottleneck that the shared cache and Benseley were designed to get rid of...
Ummm...that's not how they manufacture their chips (and it would be outrageously expensive to do so!). The testing occurs after the cores have been placed on the chip...
Yes, you are...
Firstly is the interchip communication I spoke of. HT allows for direct connections between the caches of different chips, and the chips themselves have the cache directly connected on-die through a dedicated internal bus. That bus has 2 memory controllers connected directly to system memory as well as it's own dedicated HT connection (called cHT) to other caches. Remember that contrarily, Intel must route everything through the single MCH...
Actually, the reason Intel gave for not having an on-die memory controller is that memory standards change too quickly. But what they didn't say is that it takes many years (about 5 on average) to design and release a new chip, and an on-die memory controller is a major architectual change. That's why we don't see it on C2D, but we will see it on Nehalem...
TA152H - Monday, April 2, 2007 - link
Viditor,Are you making this stuff up, or going by what Intel has said?
Intel has said that the reason they haven't gone up with the on-board memory controller, with respect to the Core 2, is because they preferred to use the silicon for the cache and other things. I think a lot of it is because they sell a lot of IGPs, and didn't want to use the awkward arrangement or either adding another memory controller outside of the processor, or having to use the processors memory controller since the IGP doesn't have it's own memory. The last part is speculation on my part, Intel said they preferred to use the transistors differently, and used cache as an example.
Your argument has now become comparitive, rather than absolute, going back to what I am saying about it helping enough. Also remember that the Penryn will have larger caches, which helps mediate this problem since you will have less contention. Both together should make a reasonably large impact on bandwidth restricted situations.
With regards to 2+2, actually, you're wrong on that. That's exactly what Intel said. They commented that they are able to run them at higher clock speeds than they could if they went native four, since they can test before they are all together rather than have to downbin, or throw away, a whole part if one of the dual cores is a failure or can't clock high. It's not speculation on my part.
Apps becoming more parallel is kind of a bad joke that people who are clueless talk about. Multithreading has been around since 1988 with OS/2, and back then I was writing them. Even for single processors, you did this, because good programmers wanted their application to always be responsive to the user even when you were doing things for them. Admittedly, Windows was quite a bit behind, but multithreading is nothing new, and there are limitations to it that simply can't be overcome. For some applications, it works great, for others you can't use it. Multiple cores are fairly new mainly because AMD and Intel can't think of anything better to do with the transistors, but multiprocessor computers are not, and people have been writing applications for them for many, many years (myself included). ILP applies to everything, TLP does not, and is essentially an admission from CPU makers that they are in a very, very diminishing returns situation with regards to transistors and performance.
With regards to the shared cache, you are also incorrect in saying it is why the Core 2 is so fast. It's a tradeoff, and you seem to ignore the L2 now has four more wait states because it is shared by two processors. I'm not sure how many more they'd have to add if it were shared among four cores, but it wouldn't be a free lunch.
Also keep in mind, theory sounds great, but where the rubber meets the road, the Clovertown does really well, and the main limitations have nothing to do with the trivialities of having a 2+2. In apps that can use it, the quad core shows a dramatic improvement over the dual. The FSB problems show up in these benchmarks rather vividly though, not a percentage or two that aren't that easily noticed.
Viditor - Monday, April 2, 2007 - link
TA152HI don't "make stuff up", mate...
"Intel does not integrate the memory controller. One reason is that memory standards change. Current Athlon computers, for instance, don't come with DDR II memory because the integrated memory controller connects to DDR I. Intel once tried to come out with a chip, Timna, that had an integrated memory controller that hooked up to Rambus. The flop of Rambus in the market led to the untimely demise of the chip"
http://news.com.com/2061-10791_3-6047412.html">News.com story
While they also listed the large cache and space as a "reason", this was the reason they mentioned most often in interviews.
If by your insinuation you were questioning how long it takes to build a chip, I'm afraid that is just a result of many years of industry knowledge on my part (though if you ask anybody who works in the semi industry, they will confirm this for you).
Nehalem for example began it's design almost 6 years ago, and has been delayed because of necessary architectual changes (similar to the way Itanium was).
Actually, the large cache doesn't help at all with the MCH bottleneck problem...in fact it makes it slightly worse. Remember that the data path for interchip communication is from cache to cache, not from system memory to cache. The larger cache (with the help of a good prefetcher) certainly helps reduce memory latency (though not as much as an on-die controller)...
Actually, multi-cores have been around for awhile...The Power4 was dual core back in 2000. What's new is that mainstream consumer level apps are being written for TLP because single cores are to be phased out...
Not true...Intel tried to convert everything to ILP with Itanium and EPIC, but it was the market (and in many cases the software companies) that decided that it was too hard and too expensive for not enough gain. Most (if not all) software companies are now developing for greater TLP efficiency, as this allows a much smoother transition (evolutionary vs revolutionary).
Sure multithreading has been around for a long time, I used many programs on my old Amiga that were multi-threaded...but it's a matter of degree.
To use an anology, when I was a kid, the best TV set you could buy was a 6" black and white set...today I have a 50" plasma that displays native 1080P. The degree to which software is optimized for TLP is increasing every day.
I said "one of the reasons"...
Actually, Clovertown is at the bottom when you're talking 4 cores...
For example, a 2P Woodcrest is significantly faster than a similarly clocked Clovertown, and they are essentially the same thing. The reason for this is that the 2 Woodcrest on the Clovertown must share the connection to the MCH while the 2x2P Woodcrest each have their own connection.
TA152H - Tuesday, April 3, 2007 - link
Actually, if you read the article, it says much more what I am saying. It talks mostly about cache, and in the interviews I have seen, that's what Intel touts. Even this article you present as proof shows the opposite, it mentions the memory changes, and then goes on and on about the extra cache and the performance of Core 2, not how quickly it can change with memory standards. Your whole premise is illogical, you are saying that with the Nehalem all the sudden memory changes will happen slower. That's plain wrong. I am saying with the Nehalem and 45 nm lithography, and the diminishing returns with adding more cache, it makes more sense for Intel to add the controller. Which is more logical to you?The larger cache makes it unnecessary for the cores to use the FSB, thus removes a bottleneck and causes less collisions. This has always been the case with multiprocessor configurations. If we have a 2+2, and one set needs to access main memory, and the other can access it's cache, you'll have less collisions than if they both needed to access main memory through the FSB. With a larger cache, you'll have less reads to main memory from each set of cores, and thus less contention.
I disagree on your remarks about TLP becoming suddenly important. Have you already forgotten about hyperthreading? Also, as I mentioned, there were ALWAYS advantages to writing multithreaded apps, even with one processor. I gave you one example where you always want your application to respond to a user, even if to tell them that you are doing something in the background for them. Another reason is that it is a lot more efficient, yes, even with a single processor. Even with the mighty 286 (an amazing processor for its day) the processor spent way too much time waiting for the I/O subsystems and a multithreaded application kept the processor busy while one thread waited on the leisurely hard disk. Yes, most programmers are hackers (a term misused now to mean someone that does bad things with code, whereas it meant someone that just sucked and couldn't write elegant code and hacked his way through it with badly written rubbish), but they still knew to write multithreaded stuff before dual cores, particularly with multiprocessing configurations becoming much more common with the P6. I'm not saying you won't see more of an effort, but the way things are being spoken about in the press is the it just takes some effort and these multicores will become absolutely fantastic when the software catches it. It ain't so, it's way overblown and there are a lot of things that will never be multithreaded because they can't be, and others that only benefit somewhat from it. Others will do great with it, it all depends on the type of application. Not every algorithm can be multithreaded effectively, and anyone who tells you otherwise reads too much press and hasn't coded a day in his or her life.
Your remarks about the Itanium are so bad I'm surprised you made them. Are you really this uninformed, or arguing just to argue. I think the latter. The problems with Itanium have nothing to do with ILP, although that was one of Intel's goals with it. The problem is, it remained a goal and has not been realized yet. Are you implying that the Itanium 2 has higher single threaded performance than the Core 2? I hope not.
If it had say 30% higher integer performance per core, on a wide list of applications, you'd have a big point there. It doesn't. It trails, in fact. First of all, I wouldn't call the Itanium a failure, because it's still premature to and I don't like counting out anything that gains market share year after year (albeit at a lower than expected rate). However, to the extent it has failed to gain the anticipated acceptance has a lot to do with cost, failures to meet schedules on Intel's part, the weird VLIW instruction set that people tend to dislike even as much as x86, and the fact it didn't run mainstream software well. Compatibility is so important, and that's why arguable the worst instruction set (aside from Intel's 432) is still king. Motorola's 68K line was much more elegant. Alpha even ran NT and couldn't dethrone it. It's hard to move people from x86, nearly (or possibly) impossible), and if you think this is some indictment against ILP, you're not even with reality.
Six years to design a processor is absurd, and you should know better. If you want to screw around with numbers why not start around 1991 or so when Intel started work on the P6 and say the Nehalem took 17 years, since some of it will come from there. People love throwing around BS numbers like that because it sounds impressive, but you only need to look at how quickly AMD and Intel add technology to their products to see it doesn't take six years. Look at AMD copying SSE, and Intel copying x86-64. Products now are derivative of earlier generations anyway, so you can't go six years back. The Nehalem will build on the Merced, it's not a totally from scratch processor. The Pentium 4 was pretty close, and the Prescott was a massive overhaul of the processor (much more than the Athlon 64 was vis-a-vis the Athlon), and it didn't take them even close to six years.
Viditor - Tuesday, April 3, 2007 - link
???...sigh...I never said anything of the sort. I can see that you are just trying to read into anything published or said just what you want it to say, so I'll stop there. Everyone else can just read the article (and the CC, the other articles Intel published on the subject, etc...). But your misunderstanding comes clear with the following:
Just to pull from a Google at random (this one from http://en.wikipedia.org/wiki/CPU_design">Wikipedia)
"The design cost of a high-end CPU will be on the order of US $100 million. Since the design of such high-end chips nominally take about five years to complete, to stay competitive a company has to fund at least two of these large design teams to release products at the rate of 2.5 years per product generation"
It's my mistake really...I thought that since you used all of these buzz words, you actually knew the industry. I was wrong...
This is another misconcenception of the novice...
1. Things like x86-64 and SSE are published many years before they are built. For example, x86-64 was first published for the public in 2001 (and in fact AMD had started work on it in 1998/9) under the name LDT. In fact, it was released to the open Consortium as freely distributable in April of 2001. The first K8 chip wasn't released until 2003.
Likewise, Intel's Yamhill team began work on x86-64 in 2000/1, though they didn't admit it's existence until much later because they wanted to foster support for IA64. The first EM64T chip was released in Q1 2005...
2. Intel and AMD have a comprehensive cross-licensing deal for their patents, and the patents are filed well before development begins...so even before it becomes public, they each know what technology the other is working on many years before release.
There are so many inaccuracies and misunderstandings in your posts that I suggest the following:
1. Use the quote feature so that I can understand just what it is you're responding to. Several of your points have nothing to do with what I said...
2. Try actually posting a link now and then so that we can see that what you're saying isn't just something else you've misunderstood...
TA152H - Wednesday, April 4, 2007 - link
I think you have a problem connecting things you say with their logical foundations, and I'll help you with that defect.You are said that Intel's main reason for not putting a memory controller on the chip was because changes in memory happen too quickly. Intel is putting a memory controller onchip for the Nehalem. Therefore, the logical conclusion is that this problem will not be as big of one with the Nehalem, since it no longer prevents Intel from doing it. You really didn't understand that? Why am I even arguing with you when you have such gaps in reasoning? I said it was mainly for the real estate savings, and that becomes less of a problem on 45nm since you have more transistors, so it's a logical premise, unlike yours.
It's kind of interesting that you read things, but don't really understand much. First of all, you said six years, now you're down to five. You also assume a completely new design, which isn't the case anymore. They are derivative from previous designs. How long do you think it took to do the original Alpha? Mind you, this is from brainstorming the requirements and what they wanted to do, designing the instruction set, etc... This is when superscalar was extremely unusual, superpipelining was unheard of, and a lot of the features on this processor were very new. Even then, it took less than five years. They have a good story on it from Byte magazine from August 1992.
If could remember anything, you'd know that AMD was against using SSE and was touting 3D Now! instead. Companies get patents, but they don't tell the whole story or for the purpose of designing a processor, any meaningful story. To make the transistor designs, you need to know specifics about how things will act under every situation and the necessary behavior. You are clueless if you think that's in the patents. You also need an actual processor to have so you can test. You wouldn't want to be AMD and implement just based on specs, because inevitably there would be incompatibilities.
You are also using your pretzel logic with regards to Yamhill. The processors had this logic in them way before they were released, and the design was done well before that. You really don't understand that? The only positive from this is you at least admit it's not six years, but is five. You'll slowly worm your way down to a realistic number, but five isn't so bad.
With regards to what I'm responding to, I could paste your stuff, but you have logical deficiencies. You are talking about multi-core, and can't make the connection to me saying multithreading has been going on forever. Even in 1992 (I got a nice batch of Byte Magazines off of eBay, and I am rereading a few of them), they were talking about how multiple cores were the future, in MIMD or SIMD configurations. How multithreading was going to take over the world, and how programmers were working on it, etc... It's funny, people are so clueless, and they just read articles and repeat them (hey, that's what I'm doing!).
My suggestion to you is to go back and get a nice batch of Byte magazines on eBay, and read them and really try to understand what they're saying, instead of being a parrot that repeats stuff you don't understand and try to sound impressive with it.
I'm done arguing with you, you're not informed enough to even interest me, and I won't even waste my time to read your responses.
Viditor - Wednesday, April 4, 2007 - link
You see? That's why I asked you to actually quote (I really was being quite sincere, it will help you)...that's NOT what I said.
What I said was that this was the reason Intel gave publicly, but that the real reason was that redesign of an architecture takes years not months. This is why they couldn't fit it on to C2D but will be able to on Nehalem...
I said Nehalem was six years and that the average was five (please go back and reread my posts...or maybe use quote?). I also said that the reason was that Nehalem was changed which is WHY it took 6 years.
They are all derivatives of a previous design...for example, the C2D is a derivative of the P3. Did you think that Intel was just twiddling it's thumbs? AMD had several years of advantage over the Netburst architecture...don't you think that they would have released the C2D many years earlier if they could have?
They use both (even now), but of course they would have preferred just 3D Now (just as Intel would have preferred everyone using just IA64). What's your point?
Sigh...
1. You need to learn the difference between "transistor design" and microarchitectural design. Both take a long time, but they are entirely different things (transistor design is part of manufacturing).
2. There are certainly ways to test as the product is being developed. For example, AMD released an AMD64 http://www.theregister.co.uk/2000/10/14/amd_ships_...">simulator and debugger to the public in 2000...
3. Even before initial tape-out (this is the first complete mask set), many sets of hand tooled silicon are made to test the individual circuits. This is the reason it takes so long...Each team works on their own specific area, then when the chip is first taped out they work on the processor as a whole unit.
4. Patents are often what initiate parts of the design...but I fail to see your point.
The first Intel processors to actually have the circuits in them (not activated) were the initial Prescotts. But saying the design was done is ludicrous...can you give a single reason why Intel included the circuits (and remember that it's expensive to add those transistors) without being able to use them other than the design not quite being finished??
I see...so instead of actually responding to what I've said, you deem it illogical and make up what I said instead?
Great idea...best one you've had. And my apologies to everyone for the length of the thread...
TA152H - Tuesday, April 3, 2007 - link
Yikes, holy typos Batman.I meant to say the Nehalem will build on the Merom. If it built on the Merced, maybe it does take six years, and I'm thinking AMD would have a real good chance of gaining market share.
yyrkoon - Monday, April 2, 2007 - link
You really havent been following processors for the last 12-14 years have you ? It has been proven, time, and time again, that a faster FSB is paramount to anything else (aside from processor core speed), in performance. Faster FSB == faster CPU->L1->L2. Memory bandwidth not so much (this is only because nothing takes advantage of memory bandwidth currently, and to be honest, I am not sure anything can, as this point), but DEFINATELTY FSB. Since, I do not see a faster core speed in the near future, the only other option for faster processors, aside from 'smarter' branch prediction' HAS to be FSB.
Now, since I have spoken against you, I suppose I am a 'dolt', or a 'moron', right ?
TA152H - Monday, April 2, 2007 - link
Is English your first language? I keep reading your sub-literal drivel and I'm not even sure what you're saying. I think you're agreeing with me that FSB does make a difference, but your writing ability is so poor it's hard to tell.Either way, you're a moron or dolt, or whatever you choose :P.
yyrkoon - Tuesday, April 3, 2007 - link
Yeah, ok, I am agreeing with you. Your triple negative threw me off there . . .
yyrkoon - Monday, April 2, 2007 - link
You can not read, and understand what I am writting, and I am the dolt or moron . . .Interresting that . . . interresting indeed. I think what I will do, is just ignore whatever else you have to say, just like the majority of other readers seemingly have done.
archcommus - Friday, March 30, 2007 - link
However if Barcelona comes out and then Penryn smashes it just a few months later, yeah, then I'm gonna be worried about them. :(Griswold - Saturday, March 31, 2007 - link
Say no to drugs.anony - Friday, March 30, 2007 - link
This is for the authors. Sorry if I missed it, but do the power measurementsinclude chipset power? AMD processors include the memory controller as well,
right? Do the performance/watt take this into account?
Ross Whitehead - Friday, March 30, 2007 - link
We measured power at the wall, but we do not include the power for the disk chassis.Thus, performance/watt takes all of your mentioned items into account.
blckgrffn - Friday, March 30, 2007 - link
I am guessing Pernyn will be different enough from Clovertown to make using vmotion (and many other enterprise features) impossible. It sucks enough that we already have two processor families in our Dell 2950's, and here comes one more.I am all for progress, it just looks like this might be something VMware has to address at some point.
Nat
Beenthere - Friday, March 30, 2007 - link
...the industry. As usual Intel's "glueblob" is another rushed-out-the-door, knee-jerk reaction to AMD supplying superior CPU products. AMD is really gonna hurt Intel with Barcelona and friends.johnsonx - Friday, March 30, 2007 - link
Beenthere + Cornfedone = CramitpalGriswold - Saturday, March 31, 2007 - link
You forgot to add some "fine-ass".Phynaz - Friday, March 30, 2007 - link
Wow, you really are a moron.Visual - Friday, March 30, 2007 - link
the two xeon sockets share a common fsb to memory and io bus, right?perhaps you should have included a 1-socket xeon vs 2-socket opteron, just to see how they compare when the xeons aren't as starved for bandwidth... not necessarily a 775 xeon and mobo, i imagine the 771 systems you used now would run just fine with just one of the cpu-s.
sure, that would turn into a core 2 extreme quadcore vs amd 4x4, or their server equivalents running server benchmarks instead of games but i'm still curious about it :p
JarredWalton - Friday, March 30, 2007 - link
I believe (could be wrong - it might be a future chipset; can't say I'm up-to-date on the server chipsets these days) that the Xeons have a Dual Independent Bus configuration, so they do get double the bandwidth. The only truly fair way of comparing would be a quad core AMD chip against a quad core Intel chip, but we obviously have to wait on AMD there. It's certainly going to be an interesting matchup later this year.Note that in 2008, Intel will use a quad bus topology similar to HyperTransport, at least on paper, so they are certainly aware of the bus bandwidth problems right now. I'm not sure FB-DIMMs are really helping matters either unless you use huge memory footprints. So FB-DIMMs can be good in the real world but bad for benchmarks that don't utilize all the available RAM.
DigitalFreak - Friday, March 30, 2007 - link
FB-DIMMs are also un-godly expensive if you need to have 16+ GB in a 2U box. With the Opteron boxes, you tend to have many more DIMM slots, so you can use lower capacity DIMMs.yyrkoon - Friday, March 30, 2007 - link
I thought my eyes were decieving me, so I had to go back and look at the charts. AMD CPUs are capable or more transactions per second ? Wow. Granted, AMD CPUs also seem to use more power, but they also seem to have a 'better' CPU usage curve.I suppose most companies, and enterprises would probably opt for the intel, based on long term power savings, and probably have an Opteron machine or two, where performance was critical.
It is nice to know, that AMD still does something better than intel. Makes me feel better about buying an Opteron 1210 for my desktop, even if it isnt a socket F Opteron . . .
Phynaz - Friday, March 30, 2007 - link
No.
The tested SYSTEM is capable of more transactions per second.
defter - Friday, March 30, 2007 - link
You mean that four top of the line AMD cpus were outperforming two second fastest Intel's CPUs?Clovertown's performance is very impressive, since according to those results two top of the line 2.66GHz Clowertowns would match performance of four 2.8GHz Opteron.
Viditor - Friday, March 30, 2007 - link
It may be less impressive than you think as 4 dual core 2.4GHz Opterons beat 2 quad core 2.33GHz Clovertowns (by 16%).
JarredWalton - Friday, March 30, 2007 - link
I'm not sure where you get that comparison. Four dual core 2.8 GHz Opterons beat two 2.33 GHz Clovertown by 16% - in certain situations.Viditor - Friday, March 30, 2007 - link
If you scroll up a few posts in this thread, you'll see the quote and link...
"...Two 2.4GHz Opteron 880 processors are as fast as one Xeon 5345, but four Opterons outperform the dual quad core Xeon by 16%..."
JarredWalton - Friday, March 30, 2007 - link
Ah, right. I think that's part of what Ross was talking about when he discusses the difficulties in coming up with appropriates tests for these systems. The Forum and Dell Store benchmarks had some serious issues, likely related to optimizations and I/O activity. There are instances where Intel does better, and of course others where they do worse.Viditor - Saturday, March 31, 2007 - link
In general I've been finding that at 4 cores, K8 and Clovertown run about the same...anything over that goes to AMD. Of course (as Ross points out) a lot of this assumes that the software can actually scale to use the 4 or more cores. For example, MySQL doesn't appear to scale at all...
We can be fairly certain that Barcelona will easily beat out any of the quad core Intel chips...I say this because based on the tests that you and Johan have done, even if Barcelona used old K8 cores they should beat them. However, things will not stand still for long on this front...
1. Barcelona is a transitional chip which won't be on the market for long. The "+" socketed K10s start coming out the following quarter with HT3, and the added bandwidth should be a nice boost.
2. Penryn comes out almost immediately afterwards, and with a 1600 FSB and a much higher clockspeed, it might be able to catch up to the K10s (I think a lot will be determined by what clockspeeds AMD is able to get out of the K10 at 65nm).
3. The most interesting (and closest) area will be the dual cores (where most of us will be living). Because the FSB bottleneck is nowhere near as bad at dual core level, I suspect that Penryn and K10 will be absolutely neck and neck here. This is where we will absolutely need to see benches...I don't think anyone can predict what will happen in the desktop area until Q3 (and the benchmarks) comes around.
As to the power section of the review, you guys did a fine job based on what you had to work with. Certainly it has nothing to do with Barcelona (as you say), and my guess is that you guys are absolutely salivating to get a Barcy for just that reason (I know I can't wait for you to get one!).
The power section is (IMHO) going to be a main event on that chip...I can't wait to see how well the split plane power effects the numbers during benchmarks!
I would like to put in my vote now...when you get your Barcy, could you do a review that encompasses power for real-world server applications? By that I mean could we see what the power draw is during normal use as well as peak and idle...?
Cheers, and thanks for the article!
ButterFlyEffect78 - Friday, March 30, 2007 - link
I also was little confused to see AMD outperforming the Intel counterparts but then I asked myself how far the gap will be when K10 opteron comes out. And then just imagine one more time having 2xquad K10 in a 4x4 setup...Godly power?JarredWalton - Friday, March 30, 2007 - link
Remember that the difference is four sockets vs. two sockets. AMD basically gets more bandwidth for every socket (NUMA), so that's why it's not apples to apples. Four dual core Opterons in four sockets is indeed faster in many business benchmarks than two Clovertowns in two sockets. Also remember that we are testing with 2.33 GHz Clovertown and not the 2.66 or 3.0 GHz models, which would easily close the performance gap in the case of the latter.Don't forget that four Opteron 8220 chips cost substantially more than two Xeon 5355 chips. $1600 x 4 vs. $1200 x 2. Then again, price differences of a few thousand dollars aren't really a huge deal when we're talking about powerful servers. $25000 vs. $27000? Does it really matter if one is 20% faster?
One final point is that I've heard Opteron does substantially better in virtualized environments. Running 32 virtual servers on an 8-way Opteron box will apparently easily outperform 8-way Xeon Clovertown. But that's just hearsay - I haven't seen any specific benches outside of some AMD slides.
yyrkoon - Saturday, March 31, 2007 - link
Yes, and no. You having worked in a data center, you know these types of system are often specialized for certain situations. That is why I said, majority Intel, and a few high performance AMD. I really dont know what these people are getting riled up about . . .
I follow virtualization fairly close, but I do not examine every_single_aspect. However, I can tell you right now, that AMD at minimum does hold the advantage here, because their CPUs do not require a BIOS that is compatable with said technology, Intel CPUs currently do. As for the performance advantage, this could have to do with the Intel systems having to have their BIOS act as middle man. Also last I read, fb-dimms were slower than DDR2 dimms, so perhaps this also plays a factor ? Another thing, how many Intel boards out there support 4-8 procesors ? The reason I ask, is that I havent seen any recently, and this could also play a factor *shrug*
TA152H - Friday, March 30, 2007 - link
It makes one wonder why the processors were compared in the first place. Did you guys throw processors in a hat and then pull them out and decide to benchmark them against each other? Why not throw a Tualatin in there just for kicks?OK, all sarcasm aside, does anyone actually think about these articles before they are written? It's not OK to put a disclaimer in to say you're making an unfair comparison, and then make it. I know it seems it is, but a lot of people don't read it, and there's an assumption, however false, that the article will be written with some common sense in it. A lot of people will see the charts, and that's what they'll base their reaction on. Is it their fault, to some extent, but still, you know they're going to do it, and they're going to get the false impression, and it's thus your fault for spreading misinformation (it amounts to this, even though it is qualified).
If you compared on cost, that would be one thing. If you compared in market segment, still fair, if you compared the only quad core to the only quad core, I wouldn't like it, but I'd still think it was at least supportable. But, when you compared a high end 8-way Opteron with a low end Clovertown, you get these reactions from people where they see AMD beating Intel and they consider it significant. Look at your responses, there is no doubt of this.
I'm not saying Opterons are vastly inferior in every situation, or I should say Opteron based systems, only that this article gives a false impression of that because of how people react to articles. They don't read every little thing, they don't even remember it, and they often walk away with a false impression because of poor choices on the part of the reviewers. People are people, you need to deal with that, however annoying it can be. But, even then, the choices were remarkably poor in that they tell a lot less than one based on closer competitors would have. The best Clovertown versus the best Opteron. The same power envelope. The same cost system. All are better choices.
I agree with your response, but the problem is, the charts speak a lot louder than responses, and disclaimers. That's why you put charts in, after all, to get attention to convey ideas effectively. When you do this with improper comparisons, I think you can see the inherent illogic in that while at the same time defending it by talking about disclaimers and such. Again, look at your responses here.
I also think the FB-DIMMS have a lot to do with Intel's relatively poor performance, and I don't think this was emphasized enough. It would be interesting to try to isolate how much of a performance penalty they have, although I don't know if this could be done precisely. Intel seems intent on using them more and more, and I fear they are heading into another RDRAM situation, where it may be a very good technology, but they introduce it too soon where it shows disadvantages and people get a negative impression on it. Obviously, they aren't pushing it the way they did RDRAM, but it seems to come with a much greater performance penalty (the 840 actually performed as well as the 440BX, overall, although the single channel 820 was kind of poor) and the cost is pretty high too, although probably not as bad as RDRAM was.
One last tidbit of information about virtualization, since it's mentioned in the article. It's kind of ironic that such a poor selling machine had so much advanced technology in it, but the IBM RT PC not only paved the way for RISC based processors, but also had virtualization even back in 1986. AIX ran on top of the VRM (Virtual Resource Manager). The VRM was a complete real-time operating system with virtual memory management and I/O subsystem. With it, you could run any OS on top of it, (in practice, it was only AIX), and in fact several at the same time. In fact, it went even further with the VMI, which had a well-defined interface for things like I/O control, processor allocation functions, etc... I'm not sure what my point is, except that most of the "new" stuff today isn't new at all. Intel was talking about multicores in the early 1990s, in fact. I guess the trace-cache and double pumped ALUs were new, but their end product didn't seem to work that great :P.
Jason Clark - Friday, March 30, 2007 - link
First off, thanks for the feedback. We spent some time considering what to compare the Clovertown to, and ultimately made the decision to compare based on core count. Was it the right decision? We *think* so, but would have rather compared it to an equivalent part. Is it unfair? Yes. Do people skim read and make comments without having read the article? Sure. Would people have freaked out if we compared a Clovertown to a 4-way socket-f configuration? AbsolutelyThe decision becomes one based on levels of "unfair", either decision would have been unfair which makes it pretty darn difficult to choose. It's a shame people don't read before commenting, although aren't just about all facets of life full of this? Your comment about comparing cost is a good one, although do you really thing given that people don't read power consumption numbers that they'd read a cost based graph? (Doubtful).
The end game is that Intel made the right decision, and Clovertown is great product because of that. We are as anxious as everyone else to see what happens with K8L, and then Penryn.
TA152H - Friday, March 30, 2007 - link
Jason,I think the central premise of your remark is that it's not possible to choose completely equal setups, in this instance, and someone would cry foul regardless of your choices because it was not possible to make such firm selections. I am going to proceed with my response based on this premise, and I apologize in advance if I am misundertanding you.
I do agree with what you're saying, but on the other hand I think you could have made it much closer than it was. I don't agree you minimized the "unfair" factor as well as you could. In fact, the Opteron cost more, ran at much higher clock speeds, and used more power. I'm not even going to complain about FB-DIMMs, or the FSB limitations Intel systems have, because they are inherent to that design and I think are completely legitimate. The benefits of using a memory controller on the chipset are obvious enough in certain configurations (it's kind of odd no one ever brings them up, and simply says the FSB is purely bad, but did anyone ever notice how much bigger the caches on Intel's processors? Hmmmm, maybe the saved space from not having a memory controller on board? How about video cards accessing memory? Do you really want to make them use the memory controller on the CPU, or add redundant logic to do it without it?). I'm not saying the memory controller on the chipset is better, overall, just that it has advantages that are almost never brought up. However, less and less as lithographies shrink and the size of the memory controller becomes less significant.
OK, back from the digression. I'm saying you should have compared something like a 2.66 GHz Clovertown to a 2.8 GHz Opteron setup. Or taken a lower Opteron to compared with a 2.33 GHz Clovertown. You should stick with the same segment. To put it another way. You might have people that have "x" dollars to spend on a server. So you'd make a valid comparison based on price. It won't work for everyone, but it will for that segment and the others can at least see the logic in it. Or, how about people that have a power bracket they have to go under. The same would apply to them, it would just be a different group (or some would fall into both). Or how about the guy that wants the fastest possible system and has a devil may care attitude towards energy, noise levels, and cost. Your comparison didn't relate well to any group I am aware of. As I mentioned, the Opteron uses more power, is more expensive, while the Clovertown does not represent the best Intel has for that segment.
So, I'm not talking about adding another chart that says something about being cost based. I'm saying compare valid processors in the first place, based on something that will be useful to a segment, as aforementioned, and created a whole bunch of useful charts instead of creating less useful ones and adding a chart at the end to somehow illustrate why it is less useful. I agree, most people won't read it, or even pay much mind to it if they did. That's why I think it's more important to make an intrinsic change to the review, rather than compare unequal processors and show how they are.
I'll try to preempt your most obvious response by saying I realize true equality is impossible in something like this. However, I think we can both agree that you could have gotten something closer than what was done. A lot closer, really.
JarredWalton - Friday, March 30, 2007 - link
This is something that almost always comes up in our IT server type reviews. There are numerous facets of this that people never seem to take into consideration. For example, given the cost, there is absolutely no way that we are going to go out and purchase any of this hardware. That means we're basically dependent upon our industry contacts to provide us with appropriate hardware.Could we push harder for Intel to ship us different processors? Perhaps, but the simple fact of the matter is that Intel shipped us their 2.33 GHz Clovertown processors and AMD shipped us their 2.8 GHz Opteron processors. Intel also shipped us some of their lower clocked parts, which surprisingly didn't use much less power. Clovertown obviously launched quite a while ago, and Ross has been working on this article for some time, trying to come up with a set of reasonable benchmarks. Should we delay things further in order to try and get additional processors, or should we go with what we have?
That's the next problem: trying to figure out how to properly benchmark such systems. FB-DIMMs have some advantages over other potential memory configurations, particularly for enterprise situations where massive amounts of RAM are needed. We could almost certainly, with benchmarks that show Opteron being significantly faster, or go the other way and show Intel being significantly faster -- it's all dependent upon the benchmark(s) we choose to run. I would assume that Ross took quite a bit of time in trying to come up with some representative benchmarks, but no benchmark is perfect for all situations.
Most of the remaining stuff you mention has been dealt with in previous articles at some point. Continuously repeating the advantages and disadvantages of each platform is redundant, which is why we always went back to previous/related articles. We've talked about the penalties associated with using FB-DIMMs, we talked about overall bus bandwidth, but in the end we're stuck with what is currently available and speculating on how things might be improved by a different architecture is simply that: speculation.
The final point that people always seem to miss is that price really isn't a factor in high-end server setups like this, at least to a point. In many instances, neither is power consumption. Let's take power as an example:
In this particular testing, the quad Opteron system generally maxed out at around 500W while the dual Clovertown system maxed out at around 350W. 150 W is certainly significant, but in the big scheme of things the power cost is not what's important. Let's just say that the company pays $.10 per kWHr, which is reasonable (and probably a bit high). Running 24/7, the total power cost differential in a year's time is a whopping $131.49. If the system is significantly faster, know IT department is really going to care. What they really care about, in some instances, this power density -- performance per watt. A lot of data centers have the maximum amount of power that they can supply (without costly upgrades to the power circuitry), so if they need a large number of servers for something like a supercomputer, they will often want the best performance per watt.
Going back to price, that can be a very important factor in small to medium business situations. Once you start looking at octal core servers, or even larger configurations, typically prices scale exponentially. Dual socket servers are cheap, and single socket servers are now basically high-end workstations with a few tweaks. The jump from two sockets to four sockets is quite drastic in terms of price, so unless you truly need a lot of power in a single box many companies would end up spending less money if they purchased two dual socket servers instead of one quad socket server. (Unless of course the IT department leads them astray because they want to play with more expensive hardware....) So basically, you buy a $20,000 or more setup because you really need the performance.
As I mentioned above, looking at the price of these two configurations that we tested, the quad Opteron currently would cost a couple thousand dollars more. If the applications that you run perform significantly better with that configuration, does it really matter that quad Opteron is a bit more expensive? On the other hand, as I just finished explaining, a large cluster might very well prefer slightly slower performance per server and simply choose to run more servers (performance per watt).
While I am not the author of this article, I take it as a given that most of our articles are necessarily limited in scope. I would never consider anything that we publish to be the absolute definitive end word on any argument. In the world of enterprise servers, I consider the scope of our articles to be even more limited, simply because there are so many different things that businesses might want to do with their servers. The reality is that most businesses have to spend months devising their own testing scenarios in order to determine what option is the best for upgrades. That or they just ask IBM, Dell, HP, etc. and take whatever the vendor recommends. :|
TA152H - Friday, March 30, 2007 - link
Jarrod,Your remarks about power are off. I'm not sure if you guys are really understanding what I'm talking about, or are just arguing just to argue. People see the performance charts, and assume you guys did a decent job of picking appropriate processors without reading the fine print. You didn't, you did a poor job of it, and people often times miss that because they didn't take the time to read it. A lot of remarks are about that. So, many people will read your performance charts and assume they are reasonable comparable parts, when they are not. Taking almost 50% more power isn't a reasonable comparison, sorry. It's a terrible one.
I'm not at all sure what you're talking about when you bring up the memory and the benchmarks. I had no complaints against them, and you are just stating the obvious when you can choose benchmarks that would make each processor look better. Benchmarks, taken alone, are always dangerous. People love things simplified so very much cling to them, but they never tell the whole story. So, I agree, but I have no idea why you bring it up.
With regards to bringing stuff up that you have already, are you saying you've pointed out the advantages of the memory controller on the chipset? I don't see that stuff brought up much at all, and it was unrelated to this article. As I said, I digressed, but the impression I get from the hobbyist sites like this is that they all say the integrated memory controller is so much better and the FSB is perfectly horrible and nothing but a bad thing. It's simply not true, integrated memory controllers have been around a long time, and I almost laugh when I see idiots saying how Intel will be copying AMD in doing this. Like AMD was the first to think of it. Or the point to point stuff. It's so uninformed, it's both comical and annoying. Intel knows all about this, as does every company making designs, and they each make different tradeoffs. Was it a mistake for Intel to wait as long? I would say no, since the P8 walks away from the K8, and Intel obviously found really good uses for their transistor budget rather than put a memory controller on the chip. It's not like they don't know how, they obviously chose not to until the P9. One oddity with Intel chips is the odd number ones almost always suck. 186 anyone? 386 sucked bad, it wasn't any better the 286 clock normalized, and it's claim to fame was running real mode elegantly. Pentium wasn't that bad, but still wasn't great and got super hot, and most of the performance was from making the memory bandwidth four times greater. Pentium 4? Oh my. What were they thinking???? P9 has a long, and bad, history :P.
The FB-DIMMs have been spoken about, that isn't what I was commenting on. I do think a lot of people are confused, even just comparing one dual core processor, how close the Opteron comes to the P8 based Xeons when they are used to seeing a much greater difference in the desktop parts. It's not just the benchmarks, the FB-DIMMs have serious performance handicaps. I don't think it's mentioned enough, although I agree a full description of it would be superfluous and unnecessary.
My remarks about price were more in line with making an apples to apples comparison, where you compare things at the processor level so you could see comparitive products. Price always matters, always, and always will, even at an end user level. I agree, the cost for four sockets is way too high for most applications, and thus they sell relatively poorly compared to dual socket systems. It's like comparing a Chevrolet Cavalier to a Dodge Viper and comparing them on a multiple of tests, and then saying how the Viper costs more, and that's the nature of sports cars like that, so we should be OK with it. Bring out the Corvette so we can see a real comparison between the two companies, not lame excuses as to why you chose the wrong processors and how it really doesn't matter. They are just rationalizations, and no one will believe them. Cost does matter. Always has, always will. Why do you think IBM doesn't dominate the PC market anymore. They made great computers, but, they cost those extra few hundred dollars more. Over 1000s of machines, it adds up. And even if it didn't, why would you compare the 2.66 with the 2.8 Opteron, which would have closer cost, and would actually illustrate what was available for those people that really needed that performance! You talk about how they need performance, and don't care about money, and then have a processor that is cheaper and doesn't have as good performance anyway! No contradiction there, huh?
OK, now I love to argue, so I am putting this last. You should have ended your argument with your accessibility to products, and I would have had nothing to say and that would be that. You are right, you can't buy it and Intel should have sent you better parts, and I would guess you actually did try to get the 2.66 Clovertown. No argument there. It's just the stuff after it that made no sense. But, thanks for the explanation, despite all the arguments I think are invalid, you did make one that I can completely understand and makes sense. Really, Intel is kind of stupid for not sending the better parts and improving their image, and if you can't get them, you can't get them and shouldn't pay for them.
JarredWalton - Friday, March 30, 2007 - link
I worked in an IT department for three years that bought extremely expensive servers that were often completely unnecessary. They just purchased the latest high-end Dell servers and figured that would be great. That's still what a lot of large enterprise businesses do. You mention desktops and IBM... and now who's bringing up irrelevant stuff? This article is only about servers, really, and a limited amount of detail at that.I also have no idea how you say my power remarks are "off". In what way? Cost? Importance? The calculations are accurate for the factors I stated. If a systems is slower and cheaper and uses more power, it will take ages to overcome the power cost in this particular instance. 150W more on a desktop is a huge deal, when you're talking about systems that probably cost $1500 at most. 150W more on a $20000 server only matters if you are near your power capacity. The company I worked for (at that location) had a power bill of about $50,000 per month. Think they care about $150 per server per year? Well, they only had about a dozen servers, so of course not. A shop that has thousands of servers would care more, but if each server that used less power costs $3000 more, and they only used servers for three years, again it would only be a critical factor if they were nearing their peak power intake.
In the end it seems like you don't like our graphs because they don't convey the whole story. If people don't read graphs properly and don't have the correct background, they will not draw the appropriate conclusions. End of line. When we specifically mention 2.66 and 3.0 GHz parts, even though these weren't available for testing, that's about as much as we can do right now. If I were writing this article, I'm certain I would have said a lot more, but as the managing editor I felt the basic information conveyed was enough.
The fact of the matter is that while Intel is now clearly ahead on the desktop, on servers it's not a cut-and-dried situation. AMD still has areas where they do well, but there are also areas where Intel does well. The two platforms are also wildly different at present, so any comparisons are going to end up with faults, short of testing about 50 configurations. I defend the article because people attack it, and I don't think they're looking at the big picture. Is the article flawed? In some cases, yes. Does it still convey useful information? I certainly think so, and that's why it was published.
We have no direct financial ties to either company (or we have ties to both companies if you want to look at it from a different perspective), and every reason to avoid skewing results (i.e. sites like us live and die by reputation). Almost every server comparison I've seen in recent years ends up with some of the faults you've pointed out, simply because the hardware is not something review sites can acquire on their own. They either differ on cost, performance segment, features, or some other area. It's the nature of the business.
TA152H - Saturday, March 31, 2007 - link
Jarrod,Your power remarks are off because you ignore the problems heat creates besides the electricity. Working for one company for three years isn't a huge sample set, you know. I have worked for a lot of big companies, and they all are a little different. With regards to heat, you have to evacuate that heat too. Air conditioning and raised floors can get expensive. Reliability is also impacted by heat. It's not a simple dollar amount based on what the processors use. Also, your argument is deceptive at best and meant to confuse. Did you choose these processor based on power/performance ratio? You picked the best from AMD and Intel in this regard and did the comparison? No, you didn't. You chose what you had.
A lot of times dumb people from IT buy really expensive stuff on purpose, so they show they needed the entire IT budget that year and waste it so it isn't made less the subsequent year. I have been part of that, so what you're saying doesn't surprise me. In fact, I have been part of situations where we bought stuff just to buy it, so we'd use up the entire allocation. I have also been in situations where every last dollar is scrutinized. My remarks about IBM were to illustrate that cost matters. Not to everyone, and sometimes in a reverse logic way like I mentioned above, but for most people it does. If it didn't, why would AMD even bother selling mid range eight way Opterons since everyone would want the best ones? Hmmmmm, it's not the power usuage is much lower, so what is it? I guess it's cost. Why are there more than one Itanium part from Intel when they are used almost exclusively in very expensive machines? Hmmmm, cost again? I think Intel and AMD have a pretty good grip on that, and they do sell multiple tier parts for even expensive segments. So, it does matter, although admittedly in individual cases in perverse ways.
I don't like your graphs because they don't convey any story correctly. I don't like that you compared wrong parts. I wouldn't expect graphs to tell the whole story, they can't, but I expect the part they do tell to make some sense. But, again, now I understand it was availability and I don't have a problem with it. Before, I was aghast at your seemingly horrible choice for the comparison. I'm not anymore.
The article probably does convey some useful information, my point was it could have conveyed a lot more with better choices. I don't have a problem with AMD being better at certain things, I knew that and accept that. I like it, in fact. I don't expect much from the P9 either, since Intel (strangely like Motorola with their 68K line) seems to have lousy odd number parts, and from what I'm reading it doesn't seem very good. I hope I'm wrong, but I have a bad feeling about it.
I never said you had any ties to either company, or even implied that you were intentionally favoring one over the other. Don't read anything into what I say, I say what I mean and if I thought you were intentionally screwing Intel, I'd have said so. I just didn't buy the nonsense about how good the choices were when I knew they sucked, and you did too. You should have just been upfront about why, and I think everyone would have understood. It's certainly better than nothing, and those were you two choices. I can't argue with that. Well, I could, but it would be specious at best :P.
yyrkoon - Saturday, March 31, 2007 - link
I'd like to point out that I read the whole article, and knew the differences. The simple matter of the fact is that what anandtech was supplied with is what was tested. End of line. Period. Did Intel not send its higher performing parts, because they do not perform as well as the Opterons also (unlikely, but possible) ? Who knows. Unless you have the parts yourself to test, you will never know.What next ? "you didnt explore the fine raindeer effect" ? Please . . .
Frumious1 - Friday, March 30, 2007 - link
About the only people that would really get upset by this type of article are fanboys or employees of Intel and/or AMD. In this case it would probably only be Intel simply because this is a test scenario where Intel isn't substantially ahead. In fact, AMD is probably quite pleased to see some more "fair" results posted. I wouldn't argue much more, Jarred, without this guy specifically answering the following question.Are you employed by Intel, Mr. TA152H, and is that why you're upset? Wasn't there a law passed recently about disclosure for such things being required? I remember http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">another article where your posts smacked of guerilla marketing, and this is the case again. You're upset that AT didn't tell the story Intel would like to see. At least that's how I see it.
TA152H - Saturday, March 31, 2007 - link
Well, actually, you're a dolt.I actually have preferred AMD to Intel most of the time, and was instrumental in bringing them into our company when the Athlon came out, because we were doing computational fluid dynamics and the x87 on it was better.
I have had a lot of dealings with both companies, and actually like them both. I don't want Intel to dominate the way they did, but I don't like articles that are misleading either. I don't have any problem with people saying AMD is better, when they are, but this type of article is bad. I can understand the problems acquiring parts, and they might have said that in the first place and I would have agreed. I don't like the false arguments, which are wrong and misleading, that rationalize the real reason. They should have just told the truth, this is what we got, this is what we tested, instead of trying to hide it and say they chose these because they were the best choices for the comparison. They weren't, that was my point, and I won't back off of it.
Viditor - Friday, March 30, 2007 - link
True, but from what we've seen when dual core Opterons came out, putting the cores on-die is a greater advantage than the bandwidth addition from the sockets.
By this I mean that the 1P/DC Opterons consistently beat out the 2P/SC Opterons on most everything (except bandwidth of course)...though the difference was only 5% or so as I recall.
Also, http://www.anandtech.com/showdoc.aspx?i=2897&p...">Johan's Article seems to suggest just these results...he states:
"Here is a first indication that quad core Xeon does not scale as well as the other systems. Two 2.4GHz Opteron 880 processors are as fast as one Xeon 5345, but four Opterons outperform the dual quad core Xeon by 16%. In other words, the quad Opteron system scales 31% better than the Xeon system"
ButterFlyEffect78 - Friday, March 30, 2007 - link
Thanks for clearing that up for me because I thought it was a simple 2 socket quad core setup. So does this mean you will be able to have 4xk10 quad possible in the future? That is 16-way Opteron box and will Intel have anything to compete with that?