ARM Challenging Intel in the Server Market: An Overview
by Johan De Gelas on December 16, 2014 10:00 AM ESTThe ARM Based Challengers
Calxeda, AppliedMicro and ARM – in that order – have been talking about ARM based servers for years now. There were rumors about Facebook adopting ARM servers back in 2010.
Calxeda was the first to release a real server, the Boston Viridis, launched back in the beginning of 2013. The Calxeda ECX-1000 was based on a quad Cortex-A9 with 4MB L2. It was pretty slow in most workloads, but it was incredibly energy efficient. We found it to be a decent CPU for low-end web workloads. Intel's alternative, the S1260, was in theory faster, but it was outperformed in real server workloads by 20-40% and needed twice as much power (15W versus 8.3 W).
Unfortunately, the single-threaded performance of the Cortex-A9 was too low. As a result, you needed quite a bit of expensive hardware to compete with a simple dual socket low power Xeon running VMs. About 20 nodes (5 daughter cards) of micro servers or 80 cores were necessary to compete with two octal-core Xeons. The fact that we could use 24 nodes or 96 SoCs made the Calxeda based server faster, but the BOM (Bill of Materials) attached to so much hardware was high.
While the Calxeda ECX-1000 could compete on performance/watt, it could not compete on performance per dollar. Also, the 4GB RAM limit per node made it unattractive for several markets such as web caching. As a result, Calxeda was relegated to a few niche markets such as the low end storage market where it had some success, but it was not enough. Calxeda ran out of venture capital, and a promising story ended too soon, unfortunately.
AppliedMicro X-Gene
Just recently, AppliedMicro showed off their X-Gene ARM SoCs, but those are 40nm SoCs. The 28nm "ShadowCat" X-Gene 2 is due for the H1 of 2015. Just like Atom C2000, the AppliedMicro X-Gene ARM SoC has four pairs of cores that share an L2 cache. However, the similarity ends there. The core is a lot beefier and it features 4-wide issue with an execution backend with four integer pipelines and three FP pipelines (one 128-bit FP, one Load, one Store). The 2.4GHz octal-core X-Gene also has a respectable 8MB L3 cache and can access up to four memory channels, with an integrated dual 10GB Ethernet interface. In other words, the X-Gene is made to go after the Xeon E3, not the Atom C2000.
Of course, the AppliedMicro chip has been delayed many times. There were already performance announcements in 2011. The X-Gene1 8-core at 3GHz was supposed to be slightly slower than a quad-core Xeon E3-1260L "Sandy Bridge" at 2.4GHz in SPECINT_Rate2006.
Considering that the Haswell E3 is about 15-17% faster clock for clock, performance should be around Xeon E3-1240L V3 at 2GHz. But the X-Gene1 only reached 2.4GHz and not 3GHz, so it looks like an E3-1240L v3 will probably outperform the new challenger by a considerable margin. The E3-1230L (v1) was a 45W chip and the E3-1240L v3 is a 25W TDP chip, and as a result we also expect the performance/watt of an E3-1240L to be considerably better. Back in 2011, the SoC was expected to ship in late 2012 and have two years lead on the competition. It turned out to be two months.
Only a thorough test like our Calxeda review will really show what the X-Gene can do, but it is clear that AppliedMicro needs the X-Gene2 to be competitive. If AppliedMicro executes well with X-Gene2, it could get ahead once again... this time hopefully with a lead of more than two months.
Indeed, early next year, things could get really interesting: the X-Gene2 will double to the amount of cores to 16 (at 2.4GHz) or up the clock speed to 2.8GHz (8-cores) courtesy of TSMC's 28nm process technology. The X-Gene2 is supposed to offer 50% more performance/watt with the same amount of cores.
AppliedMicro also announced the Skylark architecture inside X-Gene3. Courtesy of TSMC's 16nm node, the chip should run at up to 3GHz or have up to 64 cores. The chip should appear in 2016, but you'll forgive us for saying that we first want to see and review the X-Gene2 before we can be impressed with the X-Gene3 specs. We have seen too many vendors with high numbers on PowerPoint presentations that don't pan out in the real world. Nevertheless, the X-Gene2 looks very promising and is already running software. It just has to find a place in a real server in a timely fashion.
78 Comments
View All Comments
jjj - Tuesday, December 16, 2014 - link
If you look at phones and tabs ,we might be getting some rather big custom cores in 2015 and 2016. Apple and Nvidia already have that, ofc much smaller than Intel's core when adjusting for process (actually that's an assumption when it comes to Denver since don't think we've seen any die shots).Intel at the same time in consumer is pushing for more non-CPU/GPU compute units and low power and they might face a tough question about core size and even process (if they target low clocks, low power , or the opposite).Got to wonder if at some point they'll have to go for a big core just for server.Would make things even more interesting.
Might not matter but Apple kinda has the perf for an ARM Macbook Air if they go quad. Not something worth doing for such low volume but doable when they go quad on all ipads or sooner if they launch a bigger ipad. Could be a trigger for others pushing more ARM based Chromebooks and beyond. That would set the stage for even bigger ARM cores.
Also got the feeling Nintendo will go ARM in 2016 and not many reasons for Sony and M$ not to go that way if they ever make a new gen- just another market for bigger ARM cores, any significant revenue helps with dev costs so it matters.
CajunArson - Tuesday, December 16, 2014 - link
1. The Core-m is widely derided as not being fast enough for the MacBook Air.2. The Core-m is easily twice as fast as the A8X in benchmarks that count... even Anandtech's own benchmarks show that. Furthermore, when you step away from web browsers and get to use the advanced features of the Core-m like AVX, that advantage jumps to about 8x faster in compute-heavy benchmarks like Linpack.
3. Even the mythical A9 coming in 2015 is expected to have roughly a 20% performance boost over the A8x.
4. Any real computer using an ARM chip would have to have a translation layer just like the old Rosetta to run the huge library of x86 software out there. Rosetta sort of worked because the Core 2 chips from Intel were *massively* faster than the PowerPC parts they replaced. Now you expect to run the translation overhead on an A9 chip that is slower -- by a large margin -- than the Core-m parts you've already derided as not being good enough?
Yeah, I'm not holding my breath.
fjdulles - Tuesday, December 16, 2014 - link
You may be right, but remember that ARM chips using the same power budget as Intel core i* will no doubt be clocked higher and perform that much better. Not sure if that will be competitive but it would be interesting to see.wallysb01 - Tuesday, December 16, 2014 - link
Only if you want a glorified tablet as a laptop. The software most people use in real work on laptops/desktops is not going to be ported over to ARM at an speed, even if ARMs could do that work reasonably well.Kevin G - Wednesday, December 17, 2014 - link
I'm under the impression that a good chunk has already been ported. MS Office for example is native ARM on Windows RT. Various Linux distributions have ARM ports completed with ARM based office and desktop software. The main thing missing are some big commercial applications like Photoshop etc.The server side of thing is similar with Linux and open software ports. MS is weirdly absent but I suspect that an ARM based version of Windows 2012/2014 is waiting of major hardware to be released. Much of the Windows base is already ported over to ARM due to Windows RT.
Kevin G - Wednesday, December 17, 2014 - link
Indeed. Performance of ARM platforms once power constraints have been removed is a very open question. So far all the core designs in products have been used in mobile where SoC power consumption is less than 5 W. What a 100 W product would look is an open and very interesting question.Ratman6161 - Wednesday, December 17, 2014 - link
If they "use the same power budget as an Intel core i*" then what would be the point?jjj - Tuesday, December 16, 2014 - link
Ok you are focusing on the wrong thing but lets do that anyway.I have never claimed that Apple's own SoC would beat Intel's current SoCs, just that the perf would be enough if they go quad and obviously higher clocks.
When you talk Core M you should remember that the price at launch was $281 so it's not good enough for anything.
Anyway how about you compare a possible Apple SoC with a MacBook Air from 2011, lets face it the Air is a crap machine anyway , not much perf and TN panel for w/e ridiculous price it costs now and it's users are certainly not doing any heavy lifting with it.
At the same time Apple's own 15- 20$ SoC would allow them a much cheaper machine and a presence in a price segment they never competed in, adding at least 5B of revenue per year (including cannibalization) and a share gain in PC of 2-3%.
But then again the point was that there are a bunch of trends that could favor bigger ARM cores.
Morawka - Wednesday, December 17, 2014 - link
it might cost them $20 for the A8X in fab cost, but the R&D for that chip is in the 10's of millions. Factor that in, to however many they ship, and it adds at least another $20 per chipjospoortvliet - Wednesday, December 17, 2014 - link
Even more obvious then that this would save them money by spreading out the fixed costs over more devices...