The ARM Based Challengers

Calxeda, AppliedMicro and ARM – in that order – have been talking about ARM based servers for years now. There were rumors about Facebook adopting ARM servers back in 2010.

Calxeda was the first to release a real server, the Boston Viridis, launched back in the beginning of 2013. The Calxeda ECX-1000 was based on a quad Cortex-A9 with 4MB L2. It was pretty slow in most workloads, but it was incredibly energy efficient. We found it to be a decent CPU for low-end web workloads. Intel's alternative, the S1260, was in theory faster, but it was outperformed in real server workloads by 20-40% and needed twice as much power (15W versus 8.3 W).

Unfortunately, the single-threaded performance of the Cortex-A9 was too low. As a result, you needed quite a bit of expensive hardware to compete with a simple dual socket low power Xeon running VMs. About 20 nodes (5 daughter cards) of micro servers or 80 cores were necessary to compete with two octal-core Xeons. The fact that we could use 24 nodes or 96 SoCs made the Calxeda based server faster, but the BOM (Bill of Materials) attached to so much hardware was high.

While the Calxeda ECX-1000 could compete on performance/watt, it could not compete on performance per dollar. Also, the 4GB RAM limit per node made it unattractive for several markets such as web caching. As a result, Calxeda was relegated to a few niche markets such as the low end storage market where it had some success, but it was not enough. Calxeda ran out of venture capital, and a promising story ended too soon, unfortunately.

AppliedMicro X-Gene

Just recently, AppliedMicro showed off their X-Gene ARM SoCs, but those are 40nm SoCs. The 28nm "ShadowCat" X-Gene 2 is due for the H1 of 2015. Just like Atom C2000, the AppliedMicro X-Gene ARM SoC has four pairs of cores that share an L2 cache. However, the similarity ends there. The core is a lot beefier and it features 4-wide issue with an execution backend with four integer pipelines and three FP pipelines (one 128-bit FP, one Load, one Store). The 2.4GHz octal-core X-Gene also has a respectable 8MB L3 cache and can access up to four memory channels, with an integrated dual 10GB Ethernet interface. In other words, the X-Gene is made to go after the Xeon E3, not the Atom C2000.

Of course, the AppliedMicro chip has been delayed many times. There were already performance announcements in 2011. The X-Gene1 8-core at 3GHz was supposed to be slightly slower than a quad-core Xeon E3-1260L "Sandy Bridge" at 2.4GHz in SPECINT_Rate2006.

Considering that the Haswell E3 is about 15-17% faster clock for clock, performance should be around Xeon E3-1240L V3 at 2GHz. But the X-Gene1 only reached 2.4GHz and not 3GHz, so it looks like an E3-1240L v3 will probably outperform the new challenger by a considerable margin. The E3-1230L (v1) was a 45W chip and the E3-1240L v3 is a 25W TDP chip, and as a result we also expect the performance/watt of an E3-1240L to be considerably better. Back in 2011, the SoC was expected to ship in late 2012 and have two years lead on the competition. It turned out to be two months.

Only a thorough test like our Calxeda review will really show what the X-Gene can do, but it is clear that AppliedMicro needs the X-Gene2 to be competitive. If AppliedMicro executes well with X-Gene2, it could get ahead once again... this time hopefully with a lead of more than two months.

Indeed, early next year, things could get really interesting: the X-Gene2 will double to the amount of cores to 16 (at 2.4GHz) or up the clock speed to 2.8GHz (8-cores) courtesy of TSMC's 28nm process technology. The X-Gene2 is supposed to offer 50% more performance/watt with the same amount of cores.

AppliedMicro also announced the Skylark architecture inside X-Gene3. Courtesy of TSMC's 16nm node, the chip should run at up to 3GHz or have up to 64 cores. The chip should appear in 2016, but you'll forgive us for saying that we first want to see and review the X-Gene2 before we can be impressed with the X-Gene3 specs. We have seen too many vendors with high numbers on PowerPoint presentations that don't pan out in the real world. Nevertheless, the X-Gene2 looks very promising and is already running software. It just has to find a place in a real server in a timely fashion.

The Current Intel Offerings Cavium Thunder-X
Comments Locked

78 Comments

View All Comments

  • esterhasz - Thursday, December 18, 2014 - link

    But this is exactly why a wider array of machines based on their chips would make sense: the R&D cost is already spent anyways, since iPhone and iPad need chips, selling more units thus reduces R&D cost per unit. Economies of scale.

    I don't believe a MBA variant with ARM is down the road either, but the rumored iPad Pro could develop into something similar rather quickly.
  • OreoCookie - Tuesday, December 16, 2014 - link

    If you want to talk about ARM on the desktop, that's a whole other discussion, but one that most certainly needs to include price: if the price difference between a Broadwell-based Core M and a fictitious Apple A9X is $200~$230, then this changes the discussion completely. Two other factors are graphics performance (the Core M has »only« 1.3 billion transistors, the A8X ~2 billion, indicating that the mythical A9X may have faster graphics) and the fact that Apple controls the release schedule and can spec the SoC to meet its projected needs. To view this topic solely through the lens of CPU performance is myopic.
  • darkich - Friday, December 19, 2014 - link

    Your comparisons missed the picture spectacularly.
    A8X is a 20nm 2-4W TDP chip with a price that is probably around 70$.
    Top of the line Core M5Y70 is a 14nm 4.5 W TDP chip with a price of 270$.
    And it has a weaker GPU, btw. (raw performance). And it throttles massively, effectively giving only 50% of the benchmark performance.

    If you're going to compare that to an Apple chip, compare it to a 14nm A9X with custom derived PowerVR series 7 GPU,(scales up to 1,4 TFLOPS) vastly expanded memory controllers connected to a much faster RAM (compared to one in the iPad) upclocked to 2GHz, that are available at any time.
  • darkich - Friday, December 19, 2014 - link

    .. *with cores upclocked to about 2GHz
  • Flunk - Tuesday, December 16, 2014 - link

    Nintendo already sells ARM systems, the 3DS and the DS before it are both ARM-based. The PSVita is ARM too. I don't see an ARM Macbook Air anytime soon, they need a bigger and higher-clocking chip for that and it doesn't look like that's going to happen anytime soon.
  • Nintendo Maniac 64 - Tuesday, December 16, 2014 - link

    Even the Game Boy Advance used an ARM7 for its main CPU.
  • jjj - Tuesday, December 16, 2014 - link

    Obviously there are handhelds using ARM but the point was about bigger cores and clearly not handhelds.
  • DLoweinc - Tuesday, December 16, 2014 - link

    Don't quote Wikipedia, not suitable for this level of writing.
  • garbagedisposal - Tuesday, December 16, 2014 - link

    Says DLoweinc, master of knowledge and scholarly writing.
    In contrast to your childish and outdated opinion, Wikipedia is a perfectly valid source of information, go read about it and quit crying.
  • Daniel Egger - Tuesday, December 16, 2014 - link

    The problem really is the custom solutions can simply not compete with Intel on any level for general purpose computing (which the majority of applications are), not on performace/price, performance/power and not even on features/price.

    For instance I can see a huge market for sub-Xeon (or Atom C) performance at a corresponding price -> not going to happen because everyone is targeting > Xeon performance at ridiculous prices because they're expecting the margin to be there however there're simply to many compromises to be made by the buyers so that has to fail.

    Also I can see a huge demand for Atom C - Xeon performance at lower power consumption however no one seems to be really targetting this, all we get are Raspberry Pi's and a bit beefier but close from even Atom C. The new virtualisation techniques (Docker et al) opened a whole new can of possibilities for non-x86(_64) devices because virtualisation is suddenly possible and much more lightweight than ever before but no one seems to want to jump this opportunity.

    I'd really like to buy some affordable general purpose (BYOM/BYOS) hardware which has a little bit of oomph and takes little power which should be the powerful sides of any of the contenders but somehow all fail to deliver and I don't even see an attempt to change that.

    If I want mind-boggling performance at decent performance/price ratio with real virtualisation and 100% standard software compatibility there's no way around the high end Xeons (and maybe AMD iff they manage to get their asses back up) and none of the contenders is ever going to challenge that so they might as well stop trying.

Log in

Don't have an account? Sign up now