SAP S&D 2-Tier

SAP S&D 2-Tier
Operating System Windows 2008 Enterprise Edition
Software SAP ERP 6.0 Enhancement package 4
Benchmark software Industry Standard benchmark version 2009
Typical error margin Very low

 

The SAP SD (sales and distribution, 2-tier internet configuration) benchmark is an interesting benchmark as it is a real world client-server application. We decided to take a look at SAP's benchmark database. The results below all run on Windows 2003 Enterprise Edition and MS SQL Server 2005 database (both 64-bit). Every 2-tier Sales & Distribution benchmark was performed with SAP's latest ERP 6 enhancement package 4. These results are NOT comparable with any benchmark performed before 2009. The new 2009 version of the benchmark produces scores that are 25% lower. We analyzed the SAP Benchmark in-depth in one of our earlier articles. The profile of the benchmark has remained the same:

  • Very parallel resulting in excellent scaling
  • Low to medium IPC, mostly due to "branchy" code
  • Somewhat limited by memory bandwidth
  • Likes large caches (memory latency!)
  • Very sensitive to sync ("cache coherency") latency

SAP Sales & Distribution 2 Tier benchmark
(*) Estimate


The last time we discussed the SAP S&D 2-tier benchmark, we had to estimate the Xeon X5670 results. Since then HP has benchmarked its latest G6 servers, giving us results for the X5670. The performance is nothing short of astonishing. The dual Xeon X5670 outperforms a quad Opteron 8345 at 2.6 GHz. The Magny-Cours Opteron can only compete based on its somewhat lower price. We doubt that the SAP buyers care about a few hundred dollars though. A quad Opteron 6174 might have a chance against the Nehalem EX performance wise, but the SAP market will probably prefer the extensive RAS list of the Xeon Nehalem EX. The ERP market is most likely going to be dominated by Intel based servers.

OLTP benchmark Oracle Charbench “Calling Circle”  Decision Support benchmark: Nieuws.be
Comments Locked

58 Comments

View All Comments

  • zarjad - Friday, April 2, 2010 - link

    I understand that HT can be disabled in BIOS and that some benchmarks don't like HT.
  • elnexus - Wednesday, April 21, 2010 - link

    I can report that one of my customers, performing intensive image processing, found that DISABLING hyper-threading on a Nehalem-based workstation, actually IMPROVED performance considerably.

    It seems that certain applications don't like hyper-threading, while others do. I always recommend that my customers perform sensitivity analyses on their computing tasks with HT on and off, and then use whichever is best.
  • tracerburnout - Wednesday, March 31, 2010 - link

    How is it possible that Intel's Xeon X5670 rig returns 19k+ for a score while AMD's magny-cours returns only 2k+?? I only question the results of this benchmark chart because Intel's Xeon X5570 rig returns only around 1k. How can a X5670 be 19x faster than a X5570?? And I doubt the same is true for the magny-cours by being just 10.5% of what the X5670 can do.

    (is there an extra '0' by accident in there?)



    tracerburnout
    proud supporter of AMD, with a few Intel rigs for Linux only
  • JohanAnandtech - Thursday, April 1, 2010 - link

    No, it is just that Sisoft uses the new AES instructions of West-mere. It is a forward looking benchmark which tests only a small part of a larger website code base. So that 19x faster will probably result in 10 to 20% of the complete website being 19x faster. So the real performance impact will be a lot slower. It is interesting though to see how much faster these dedicated SIMD instructions are on these kinds of workloads.
  • alpha754293 - Thursday, April 1, 2010 - link

    If you guys need help with setting up or running the Fluent/LS-DYNA benchmarks let me know.

    I see that you don't really spend as much time writing or tweaking it as you do with some of the other programs, and that to me is a little concerning only because I don't think that it is showing the true potential of these processors if you run it straight out-of-the-box (especially with Fluent).

    Fluent tends to have a LOT of iterations, but it also tends to short-stroke the CPU (i.e. the time required to complete all of the calculations necessary is less than 1 second and therefore; doesn't make full use of the computational ability.)

    Also, the parallelization method (MPICH2 vs. HP MPI) makes a difference in the results.

    You want to make sure that the CPUs are fully loaded for a period of time such that at each iteration, there should be a noticable dwell time AT 100% CPU load. Otherwise, it won't really demonstrate the computational ability.

    With LS-DYNA, it also makes a difference whether it's SMP parallelization or MPP parallelization as well.
  • k_sarnath - Friday, April 2, 2010 - link

    The most baffling part is how linux could engage 12-CPUs much better than windows. I am obviously curious about the OS platform for other tests.. Similary MS SQL was able to scale well on multi-cores... In this context, I am not sure how we can look at the performance numbers... A badly scaling app or OS could show the 12-core one in bad light.
  • OneEng - Saturday, April 3, 2010 - link

    Hi Johan,

    I have followed your articles from the early day's at Ace's and have a good respect for the technical accuracy of your articles.

    It appears that the X5570 scaling between 4 and 8 cores has very little gain in the Oracle Calling Circle benchmark. Furthermore, the 24 cores of MC at 2.2Ghz are way behind. Westmere appears to do quite well, but really should not be able to best 8 cores in the X5570 with all else being equal.

    I have heard some state that the benchmark is thread bound to a low number of threads (don't know if I am buying this), but surely something fishy is going on here.

    It appears that there is either a real world application limit to core scaling on certain types of Oracle database applications (if there are, could you please explain what features an app has when these limits appear), or that the benchmark is flawed in some way.

    I have a good amount of experience in Oracle applications and have usually found that more cores and more memory make Oracle happy. My experience seems at odds with your latest benchmarks.

    Any feedback would be appreciated .... Thanks!
  • JohanAnandtech - Tuesday, April 6, 2010 - link

    I am starting to suspect the same. I am going to dissect the benchmark soon to see what is up. It is not disk related, or at least that surely it is not our biggest problem. Our benchmark might not be far from the truth though, I think Oracle really likes the big L3-cache of the Westmere CPU.

    If you have other ideas, mail at johanATthiswebsiteP
  • heliosblitz2 - Wednesday, April 7, 2010 - link

    You wrote
    Test-Setup:
    Xeon Server 1: ASUS RS700-E6/RS4 barebone
    Dual Intel Xeon "Gainestown" X5570 2.93GHz, Dual Intel Xeon “Westmere” X5670 2.93 GHz
    6x4GB (24GB) ECC Registered DDR3-1333

    "Also notice that the new Xeon 5600 handles DDR3-1333 a lot more efficiently. We measured 15% higher bandwidth from exactly the same DDR3-1333 DIMMs compared to the older Xeon 5570."

    That is not exactly the reason, I think.
    The reason ist you populated the second memory-bank in both setups.
    Intel specification:
    Westmere-1333MHZ-CPUs run with 1333 MHZ with second bank populated while
    Nehalem-1333MHZ-CPUs run with 1066 MHZ with second bank populated

    That could be updated.

    Compare tech docs on Intel site: datasheet Xeon 5500 Part 2 and datasheet Xeon 5600 Part 2

    Arnold.
  • gonerogue - Saturday, April 10, 2010 - link

    The Viper is a V10 and most certainly not a traditional muscle car ;)

Log in

Don't have an account? Sign up now