Comments Locked

40 Comments

Back to Article

  • Amiga500 - Thursday, April 10, 2008 - link

    Do you intend to run solely server benchmarks, or also compare typical HPC software?
  • JohanAnandtech - Thursday, April 10, 2008 - link

    At this point, we understand the Linpack benchmark very well, so most likely we'll include Linpack. If you have suggestions for other HPC benchmarks which do not require complex setup and do not require tens of hours of benchmarking, I am open to suggestions.
  • Amiga500 - Thursday, April 10, 2008 - link

    The NAS benchmarks might be nice, but it would take you some time to get through the paperwork etc to actually get hold of the programs, never mind set them up and run (although I think some are pretty much tuning-free).

    Its probably not worth the hassle, but if your interested:

    http://www.nas.nasa.gov/Resources/Software/npb.htm...">http://www.nas.nasa.gov/Resources/Software/npb.htm...



  • americantabloid - Thursday, April 10, 2008 - link

    Hi, will you look into the Sun T2 and compare it to Intel and AMD as you previously did with the T1?

    Best regards
    at
  • JohanAnandtech - Thursday, April 10, 2008 - link

    Not in the short term. The problem is that server + virtualization testing is pretty complex. We first want to get it right on VMWare/Xen/Hyper-V on x86 before we try out Sun's T2. VMWare/Hyper-V surely won't run on the T2 and I would be amazed that Xen already runs well on the T2.

    So a T2 and x86 comparison would mean that we also need to include yet another Hypervisor and OS (we have a bit of experience in Solaris, but not much), which is a bit beyond our available expertise and manpower :-).
  • somedude1234 - Thursday, April 10, 2008 - link

    I'm pricing out a few 2P servers for storage application testing.

    I'm looking to future proof with pci-e 2.0 for 8 Gpbs fibre channel, 10 Gbps ethernet, and 6 Gpbs SAS adapters coming up.

    For Xeon, the 5400 series of CPU's and chipset are the only choice. I'm amazed how much of a server you can get for $2.5k today, even if you have to pay for FB-dimm's.

    What are my options on the opteron end of things? I don't see anything from either HP or Supermicro for 2P barcelona + pci-e 2.0 systems.
  • jefmes - Tuesday, April 15, 2008 - link

    I'm not sure about it being pci-e 2.0, but take a look at the DL385 G5

    http://h10010.www1.hp.com/wwpc/us/en/sm/WF05a/1535...">http://h10010.www1.hp.com/wwpc/us/en/sm...351-3328...

    We've been using the DL380 G5 for VMWare ESX 3.02 for a while now, and it's become my favorite server for most deployments. The DL385 G5 looks to be nearly the same but AMD flavored.
  • GregD - Wednesday, April 9, 2008 - link

    Has everyone seen this?
    http://h18004.www1.hp.com/products/servers/prolian...">http://h18004.www1.hp.com/products/servers/prolian...
  • dijuremo - Wednesday, April 9, 2008 - link

    Minus 1 point for being so bad at math... 8cpus*4cores/cpu=32cores not 64.
  • GregD - Thursday, April 10, 2008 - link

    Sorry, you're right - I was just reading about the 8 core AMD's that are comint out in 2009 - a little ahead of myself
  • tshen83 - Wednesday, April 9, 2008 - link

    Is indeed more impressive compared to Intel's simply because of the memory controller and its scalability.

    However, one thing must be said that 4S market will be dying off because 2S systems are the most economical from a price/watt/dollar perspective. In the 2S market, Intel's Xeons are priced in parity compared to its Core 2 products which makes it very attractive. Let's face it, google is buying 2S systems from Intel, the decision must be right.

    Scalability now is solved by software such as clustering techniques: Hadoop and Google Map Reduce programming paradigms.
  • Starglider - Thursday, April 10, 2008 - link

    > However, one thing must be said that 4S market will be
    > dying off because 2S systems are the most economical from
    > a price/watt/dollar perspective.

    Maybe if you have copious rackspace. For may users rackspace is at a premium; decemt co-lo is expensive. We're about to put in a couple of new 16-core servers, possibly Opterons (which is why I'm looking forward to this review). The reason is that it's the maximum CPU power you can cram into 1U (without going to blades, which aren't appropriate for us ATM).
  • tshen83 - Thursday, April 10, 2008 - link

    "Maybe if you have copious rackspace. For may users rackspace is at a premium"

    So you pay a 4x premium on the CPUs to save 1U of space? That's about the best argument I have heard.

    If you are so cash strapped to have to put 16 CPUs in 1U, you obviously have no idea about data center cooling requirements. Not many data centers currently give you that density. A 4S quad Opteron server would draw about 800-1000W(125W*4 + 10W per stick of memory *16 sticks + 12W per hard drive *8 drives) That is between 6-8A of current per 1U. There is many data center I am aware of that will provision 300A+ per rack(40U* 8A per U) Most datacenters will give you 20A for the whole rack. So you can probably put 3 of those systems in a entire rack which means the 1U meaningless.
  • joekraska - Sunday, April 13, 2008 - link

    > Most datacenters will give you 20A for the whole rack.

    Mmmm. Older data centers might feature dual 208 20A or dual 115 30A. But the trend is now away from this.

    Defaults are to dual 208 30A, and that's only for "low density".

    For high density, you'll see AT LEAST four L6-30's above the rack now, and you'll start seeing two to four L21-30 (30A 3 phase) or even 35A 3 phase to the rack quite frequently with newer high density deployments.

    Frankly, powering high density deployments is getting so challenging, that I'm expecting > 35A 3 phase within 24 months. How we avoid killing data center facilities workers, I haven't figgered out yet. :-)

    Anyway, whichever poster said the cost economics aren't there for 4S servers was correct. Up until the present. For example, if you are buying a Dell R900 over two Dell 2950iii's to run virtual servers you are making a grave financial error. It will, however, be interesting to see if AMD's newer system can change that equation.

    Joe.
  • DigitalFreak - Thursday, April 10, 2008 - link

    And the BS keeps on a-comin.
  • Creig - Thursday, April 10, 2008 - link

    I think you need to shut up now...
  • Starglider - Thursday, April 10, 2008 - link

    > So you pay a 4x premium on the CPUs to save 1U of space?

    The 8-socket Opterons are between two and three times as expensive as the 2-socket ones depending on model.

    > A 4S quad Opteron server would draw about 800-1000W (125W*4
    > + 10W per stick of memory *16 sticks + 12W per hard drive
    > *8 drives)

    That is ridiculous. 125W is a theoretical maximum, in practice even loaded CPUs are unlikely to exceed 80W. Registed DDR2 draws about 4 watts a stick, not 10 (were you thinking of FB-DIMM). I doubt it's physically possible to get 8 3.5" drives into a 1U case along with a 4S motherboard; the most I've seen is 4, though 12W is a reasonable loaded power draw. Including motherboard and fan power draw and power supply inefficiency, that's about 600W for a fully loaded server, which is two and a half amps (yes, I live in a country with a sane mains supply voltage). Of course full load is only experienced for a few hours a day; most of the time the power draw would be down below 300W.

    > Most datacenters will give you 20A for the whole rack.

    I don't know what worthless provider you're using that can only manage 2.5 KW/rack, but ours allows 400 watts per 1U server. We may have to ask for an upgrade to that if low voltage chips aren't an option.
  • Justin Case - Wednesday, April 9, 2008 - link

    Assuming you need the processing power of 16 CPU cores, how is a single 16-core system less efficient per watt / dollar than two separate 8-core systems? A single "big" system lets you dynamically allocate resources (either within a single OS or through virtualization) and run much more efficiently. Not to mention it's cheaper (the only element that's more expensive is the main board, everything else costs the same, and doesn't need to be duplicated).

    As to "scalability as been solved", maybe you should tell that to all the "morons" in the HPC field that are still using (and planning, and designing, and putting together) supercomputers...?
  • JohanAnandtech - Wednesday, April 9, 2008 - link

    Google is a special case, as most of the applications they are running have almost infinite scalability. However, a lot of servers are also sold to consolidate smaller ones on. 4 socket systems might then be much more interesting than buying dual socket ones. You have twice the memory, twice the CPU power and a whole of other "big box" advantages such easier serviceability (adding another network controller or Disk controller) etc.

    4-socket have probably a good future ahead thanks to virtualization and the fact that they also gotten smaller: you get 4 socket systems in 2U (with lots of expandability), 1U and even blades.
  • tshen83 - Wednesday, April 9, 2008 - link

    Johan:

    Your argument that Google and facebook's decision to buy 2S systems is a special case then you are seriously mistaken. 4S systems have a very small niche market.

    That niche market is for customers who have dependence on non-clustered software packages that can scale vertically to 16 cores. There aren't many types of software that cannot cluster but can scale linearly on 16 cores. Database and Virtualization are the only two types where it can be scaled vertically.

    For database apps, ram rules, and memcache basically took care of distributed memory cache for database. So having extremely large amount of memory locally on the DB server is now mitigated. Plus, given the pricing on the 4S CPUs, it is better to buy four 2S nodes and use DB cluster(MySQL NBD Engine) or simply Master-Slave configuration to handle load than having a single one. You needed redundancy anyways.

    For virtualization, I have not seen any virtualization software package that can virtualize multiple-CPU systems reliably, simply due to CPU cycle contention issues. What I mean by that is that, ideally, you want to be able to virtualize 16 systems, each with 16 CPUs, on a 16CPU system. In that setup, every virtualized system can get access to maximum 16 cores, so every virtualized system can potentially take over the entire server should the load increase. Right now, all virtualization software are good at virtualizing 1-CPU machine as a single thread on the host OS scheduler. Even VMware's 2CPU support is buggy, and you will notice performance degradations when the total virtualized CPU is greater than the physical CPUs you have. That limits the potential power of the virtualized system.

    On the hardware side, you failed to mention the nonlinear scaling of the Opterons at the 4S level(Intel too for that matter). The only chipset I am aware of that can support 4S and 8S Opterons are from Nvidia and as far as I know, is HyperTransport 1 based. You need HyperTransport 3's bandwidth to be able to scale Opterons to 4S linearly.

  • JohanAnandtech - Thursday, April 10, 2008 - link

    "You need HyperTransport 3's bandwidth to be able to scale Opterons to 4S linearly."

    You base this on? HT 1.0 at 1 GHz has 8 GB/s Full duplex (16 GB/s) available, only for syncs and accessing remote memory between the CPUs. Many applications optimize now for NUMA, keeping data close to the processing node, eliminating a lot off inter CPU communication.
  • tshen83 - Thursday, April 10, 2008 - link

    Look, I really don't have much time to argue with you. For a system reviewer, I thought you should be more versed in the technical specifications.

    The 8000 series Opterons have 3 coherent HT links, two of which is used for intra-processor communications, and 1 for the connection to the chipset, which means that in a 4S system, any processor can directly talk to 2 additional processors via a HT 1.0 link at 8GB/sec. That means there is 1/4 chance that a processor has to hop twice to get to the last CPU. Granted, NUMA makes that lesser of a problem. However, in any memory intensive application, Database for example, where the entire dataset is cached in the massive memory, the one extra hop is a pain in the butt. Plus 8GB/sec link isn't exactly good match for the 12.8GB/sec(DDR2-800) or 10.6GB(DDR2-667) memory controller each CPU can do.

    It is a very easy thing to do. Why don't you benchmark the system with 4S and compare it to the result with 2 of the Sockets turned off and see if the results are linear. I will tell you it isn't. You are getting a 50% scaling for the last 2 CPUs.
  • MGSsancho - Saturday, April 12, 2008 - link

    http://en.wikipedia.org/wiki/AMD_Horus">http://en.wikipedia.org/wiki/AMD_Horus

    chip links HT groups to make 32 way system. It's used in the back pane of chasies. oh cray uses their seastar chip to make super computers. http://www.cray.com/products/xt4/index.html">http://www.cray.com/products/xt4/index.html. Last I checked they have good buisness.

    HP is not a small corporation. They would not invest the resources to make this product if they did not think it could sell well.
  • JohanAnandtech - Thursday, April 10, 2008 - link

    "You need HyperTransport 3's bandwidth to be able to scale Opterons to 4S linearly. "

    "Look, I really don't have much time to argue with you. For a system reviewer, I thought you should be more versed in the technical specifications. "

    Your original statement is so oversimplified, so the moment I challenge it, I Am not versed in tech specifications? That is not making any sense.
    ich
    Anyway, HT 3.0 will help, but it won't make Opterons scale linearly. How well a system scales depends on the software, as you are well aware. And the reason why I challenged your statement was that I would like to see some preview benchmarks which in which case scaling is so much better.

    Your orginal statement states that the limitations of HT 1.0 are automatically the ones that keep the Opteron from scaling in 4S. That has been proved in hard numbers in the 8S space, but I recall no such numbers for the 4S space.


    " However, in any memory intensive application, Database for example, where the entire dataset is cached in the massive memory, the one extra hop is a pain in the butt. Plus 8GB/sec link isn't exactly good match for the 12.8GB/sec(DDR2-800) or 10.6GB(DDR2-667) memory controller each CPU can do. "

    That 12.8 GB/s is half/duplex. And since the memory bus is a lot less efficient, I wouldn't be surprised, that it is a good match.

    "It is a very easy thing to do. Why don't you benchmark the system with 4S and compare it to the result with 2 of the Sockets turned off and see if the results are linear. I will tell you it isn't. You are getting a 50% scaling for the last 2 CPUs. "

    With all respect, but what is that going to proof? That it is harder to scale your software with more CPUs? There are many other explanation than HT 1.0 limitations.

    It is certainly not going to proof that you have a bandwidth limitation with HT 1.0 in most software.

    Yes, one hop less in HT 3.0 is going to help. But it is not going to make software scale linearly. And depending on the software, switching from HT 1.0 to 3.0, results will range from "hardly measurable" to "very significant".
  • tshen83 - Thursday, April 10, 2008 - link

    I don't work for Intel or AMD. But I can tell you surely that distributed system designs will overpower vertical scaling for sure. AMD simply send you 4 CPUs(about 6000 dollars worth) so you can write an article on anandtech to pump the fact that the TLB issue is solved.

    Although software cannot scale linearly, but you can still test the linear scalability by running multiple instances of the same software and take the average composite score the software gives you. You will be surprised how bad the scaling is past 2S. In fact the only good thing about 4S and 8S is the independent memory controller that each Opteron comes with, so you can load a ton of memory in the server for an in-memory database, and have a quasi log(n) scaling of the memory subsystem. As I have said, distributed memcache took that advantage away too.

    Let me conclude this argument by saying that there is a reason why you write articles about systems. You should really just follow what Google's doing. They are buying 2S systems from Intel. End of argument.
  • DigitalFreak - Thursday, April 10, 2008 - link

    I thought you didn't have any more time to argue?
  • JohanAnandtech - Thursday, April 10, 2008 - link

    "Let me conclude this argument by saying that there is a reason why you write articles about systems. You should really just follow what Google's doing. They are buying 2S systems from Intel. End of argument. "

    Google is using SATA disks for their database apps. Does this mean that SAS disks are not worth considering for a database app?

    "there is a reason why you write articles about systems."
    I would love to hear that reason... You have a tendency to generalize, so I am expecting something like that again.
  • tshen83 - Wednesday, April 9, 2008 - link

    What I am trying to say so far is that if you need to buy 4S and 8S systems to scale, it means your software infrastructure is behind the curves already. It is going to be prohibitively expensive from this point on to scale vertically. I don't think any DB system admin or Virtualization system admins should run to their boss and say, let's dump 30K on a 4S server because it is the EASIEST way to scale instead of spending some money for a software engineer to tweak the system to scale horizontally from that point on, and get ten $3K 2S nodes.
  • TheJian - Wednesday, April 9, 2008 - link

    So that means Intel has a FIREHEADED 7350 at 130w then right? Heck you've got it in the same list and you make a comment like that about AMD? Saying Intel has already hit 3ghz in Xeon's but leaving out what TDP they run at is a bit misleading. I feel like I'm being led to believe Intels 3ghz chips are not hotheaded. I'm not saying I love either, just pointing out those kind of statments a sensational at best.
  • DigitalFreak - Thursday, April 10, 2008 - link

    Down, fanboy.
  • JohanAnandtech - Thursday, April 10, 2008 - link

    Intel's Xeon 5472 runs at 3 GHz and needs 80W TDP, so there is nothing misleading there. And Yes, I agree that Intel's 130W models are hotheaded too. If you look at the article, you'll see that I was trying to contrast the difference between the current 2.3 and 2.4/2.5 GHz SE AMD CPUs. Just 100 MHz more, and you get a 31% increase in power (and in reality it is probably more).
  • highlandsun - Wednesday, April 9, 2008 - link

    Let me know if you want help setting up an LDAP benchmark.
  • JohanAnandtech - Wednesday, April 9, 2008 - link

    Hi,

    Please contact me on my mail: johan at anandtech.com
  • ap90033 - Wednesday, April 9, 2008 - link

    All but one AMD use more power. Im interested to see how these CPU's REALLY perform. If they are the same or slower then why bother? They are playing catch up and need something a lot better.
  • Justin Case - Wednesday, April 9, 2008 - link

    AMD and Intel calculate "power" differently. AMD calculates the maximum power the chip could possibly draw (based on transistor count), Intel calculates the "maximum power use under intensive load", whatever that means. Also, AMD CPUs have a built-in memory controller, which is in the northbridge for Intel systems. In other words, comparing CPU power figures is irrelevant; you always need to compare (at least) CPU + MB + RAM.

    As to "why bother", well, from the point of view of a consumer (which I suspect is not even your case, for these models), it's a great thing they "bother". If one of the manufacturers were left without competition, prices would skyrocket (like they did with Xeons before the Athlon MP / Opteron, and like they did with the Opteron in the PresHott days).
  • duploxxx - Wednesday, April 9, 2008 - link

    don't look at cpu power consumption only, for sure not in a tigertown config. you have massive power consumption of the FBdimm (since you don't want a 16core server with only 8-16GB ram and the chipset power is also allot.

    @anand how are you going to test virtualization? vmmark?

    nice to see you take the bench right from the start, even with only 2,3-2,5 barcelone Intel knows they are in deep 4p trouble... and nehalem 4p release is indeed for 2009, but that is very optimistic, for sure its not on the q1-q2 2009 roadmap yet....

    when will you do a 2s compare?
  • DigitalFreak - Wednesday, April 9, 2008 - link

    "nice to see you take the bench right from the start, even with only 2,3-2,5 barcelone Intel knows they are in deep 4p trouble... and nehalem 4p release is indeed for 2009, but that is very optimistic, for sure its not on the q1-q2 2009 roadmap yet...."

    I see the stupid fanboys are alive and well in the server arena too...
  • duploxxx - Thursday, April 10, 2008 - link

    no, the fools that have no clue about server performance
  • DigitalFreak - Thursday, April 10, 2008 - link

    Like you?
  • Nehemoth - Wednesday, April 9, 2008 - link



    I would like to see benchmarks inside a virtualize server, after all this is the real way in which we would use virtualization.

Log in

Don't have an account? Sign up now