With the advent of higher performance Arm based cloud computing, a lot of focus is being put on what the various competitors can do in this space. We’ve covered Ampere Computing’s previous eMag products, which actually came from the acquisition of Applied Micro, but the next generation hardware is called Altra, and after a few months of teasing some high performance compute, the company is finally announcing its product list, as well as an upcoming product due for sampling this year.

Ampere’s Altra is a realized version of Arm’s Neoverse N1 enterprise core, much like Amazon’s Graviton2, but this time in an 80-core arrangement. Where Graviton2 is designed to suit Amazon’s needs for Arm-based instances, Ampere’s goal is essentially to supply a better-than-Graviton2 solution to the rest of the big cloud service providers (CSPs). Of the companies that have committed to an N1 based design, so far on paper Ampere is publically the biggest and fastest on the books.

The Ampere Altra range, as part of today’s release, will offer parts from 32 cores up to 80 cores, up to 3.3 GHz, with a variety of TDPs up to 250 W. As we’ve described in our previous news items on the chip, this is an Arm v8.2 core with a few 8.3+8.5 features, offers support for FP16 and INT8, supports 8 channels of DDR4-3200 ECC at 2 DIMMs per channel, and up to 4 TiB of memory per socket in a 1P or 2P configuration. Each CPU will offer 128 PCIe 4.0 lanes, 32 of which can be used for socket-to-socket communications implemented with the CCIX protocol over PCIe. This means 50 GB/s in each direction, and 192 PCIe 4.0 lanes in a dual socket system for add-in cards. Each of the PCIe lanes can bifurcate down to x2.

Ampere 1st Gen Altra 'QuickSilver'
Product List
AnandTech Cores Frequency TDP PCIe DDR4 Price
Q80-33 80 3.3 GHz 250 W 128x G4 8 x 3200 ?
Q80-30 80 3.0 GHz 210 W 128x G4 8 x 3200 ?
Q80-26 80 2.6 GHz 175 W 128x G4 8 x 3200 ?
Q80-23 80 2.3 GHz 150 W 128x G4 8 x 3200 ?
Q72-30 72 3.0 GHz 195 W 128x G4 8 x 3200 ?
Q64-33 64 3.3 GHz 220 W 128x G4 8 x 3200 ?
Q64-30 64 3.0 GHz 180 W 128x G4 8 x 3200 ?
Q64-26 64 2.6 GHz 125 W 128x G4 8 x 3200 ?
Q64-24 64 2.4 GHz 95 W 128x G4 8 x 3200 ?
Q48-22 48 2.2 GHz 85 W 128x G4 8 x 3200 ?
Q32-17* 32 1.7 GHz 58 W 128x G4 8 x 3200 ?
Q32-17 32 1.7 GHz 45 W 128x G4 8 x 3200 ?
*With 4 TiB DRAM Installed

I must credit Ampere here. This is by far the easiest product naming scheme I’ve ever seen. Intel could learn a million things from this naming scheme alone. The ‘Q’ stands for QuickSilver, the codename of the underlying SoC, followed by a core count and a frequency.

Previously Ampere had stated they were going for 80 cores at 3.0 GHz at 210 W, however the Q80-33 is pushing that frequency another 300 MHz for another 40 W, and we understand that the tapeout of silicon from TSMC performed better than expected, hence this new top processor.

It’s worth doing some basic metrics on power efficiency. If we take the TDP as solely the power for the cores, and do some math on Watts per Core, then GHz per Watt, the top Q80-33 SKU scores 1.06, around the middle of the pack (most CPUs score 0.95-1.25 GHz/W). The highlight of the list by this metric is the Q64-24, offering the most frequency for the least power: 1.62 GHz per Watt.

Also, just because we have the numbers, AMD’s big Rome CPUs consume about 3 W per core at full load, and run at approximately 3.0 GHz on all CPUs. Altra, by comparison, uses 2.6 W per core on the Q80-30. These Altra CPUs have no turbo mechanism, and thus the TDP metrics being given by Ampere are for the literal peak power consumption numbers, so what is listed above is merely a design point for chassis building, rather than a full representation of power consumption when deployed in the cloud.

Ampere states they have a number of ODMs on board that will be ready to provide Altra systems, including Gigabyte and Wiwynn, with a couple of second tier players also in the mix. These systems should be more readily available in August and September.

When we asked Ampere about the interest for these chips, the company stated that most of the interest from CSPs was actually at the high end dual socket deployments, for the highest core counts and the highest frequencies. Even though Ampere isn’t announcing pricing publically, the company states that their pricing has not been an obstacle for CSP deployments, with major customers testing the hardware for up to 2 months already. Current announced customers include Packet and CloudFlare, with Packet offering early access for its key clients.

Ampere is also one of the lead partners for CUDA on Arm, and is set to offer full CUDA support for Altra when paired with NVIDIA graphics accelerators.

Altra Max

If that wasn’t enough, Ampere dropped a sizeable nugget into our pre-announcement briefing. The company is set to launch a 128-core version of Altra later this year. 

This will be a new silicon design, beyond Ampere's initial layout of 80 cores for Altra, however Ampere states that while they are using the same platform as the regular Altra, they have done extensive tweaking and optimizations within the mesh interconnect for Altra Max to hide the additional contention that might occur when using the same main memory speeds.

Altra Max will be socket and pin-compatible with Altra, also support dual socket deployments, and Ampere states that the silicon will be ready for early sampling with partners in Q4, and is looking to move into high volume in mid-2021. The 128-core design was given the code-name Mystique, and so we might expect to see these CPUs start with the letter M.

Update on 5nm Siryn

The next generation of Ampere’s product line, as previously reported, is going to use the codename Siryn (sire-inn) and be built on TSMC’s 5nm process, set for sampling in late 2021. Ampere stated in our briefing that test chips that use IP meant to be adopted in Siryn have already taped out - the actual Siryn chip will tape out sometime in the next year.

Siryn will likely be marketed as ‘2nd Generation Altra’, and if the naming convention of CPUs stays the same, these will start with ‘S80’ etc. Ampere has stated that the Siryn platform will be new, especially because of new technologies (PCIe 5.0 and DDR5 were mentioned, but not confirmed for Siryn).

Related Reading

 

Comments Locked

19 Comments

View All Comments

  • back2future - Tuesday, June 23, 2020 - link

    or
    https://www.computeexpresslink.org/download-the-sp...
  • back2future - Tuesday, June 23, 2020 - link

    Having a look at figure 4 schedulers load becomes visible and explains ~85% node utilization on two years period, average waiting time for public queues on Tachyon2 system ~9.3h
    https://www.mdpi.com/2076-3417/10/7/2634/pdf
    It's about cores and node utilization, but maybe also about more dynamic bandwidth and dynamic interconnecting abroad?
    Interconnect family: https://www.top500.org/statistics/efficiency-power...
  • thetrashcanisfull - Tuesday, June 23, 2020 - link

    Interesting that only 32 lanes can be dedicated to inter-socket communication - that is *much* less than Rome (Epyc 7002 series can dedicate either 64 or 48 lanes) despite having almost identical per-socket memory and I/O bandwidth. I'll be curious to see how/if this impacts performance.

    Also interesting that the full I/O and memory capabilities are available even on the 45W and 58W TDP parts - I would expect those to take up a huge portion of such a small power budget.
  • MrSpadge - Tuesday, June 23, 2020 - link

    Maybe the TDP is just for the cores?
  • mode_13h - Wednesday, June 24, 2020 - link

    Isn't x32 the max number of lanes that PCIe can bond? AMD clearly got around that, somehow, but maybe Ampere is using some off-the-shelf PCIe IP.
  • MrSpadge - Tuesday, June 23, 2020 - link

    "I must credit Ampere here. This is by far the easiest product naming scheme I’ve ever seen. Intel could learn a million things from this naming scheme alone."

    Agreed. And not only the naming, but also the very meaningful product differentiation is something I wish I'd see from Intel. Do we really need 4 different Core i5, which only differ by 100 - 200 MHz?
  • serendip - Thursday, June 25, 2020 - link

    The beginning of the end for Intel? They're still strong for anything server-related but Xeons don't look competitive when faced with Rome and Altra. If you want the best x86 bang for the buck, go for AMD; for the most performance per watt, go for ARM. Intel's being squeezed in the middle and continuing process woes aren't helping.
  • techbug - Tuesday, October 13, 2020 - link

    " This means 50 GB/s in each direction, and 192 PCIe 4.0 lanes in a dual socket system for add-in cards"

    32lanes * 25GT/s / 8 bit/B = 100GB/s?
  • carcakes - Friday, April 9, 2021 - link

    Ampere® Altra™ Family of Cloud Native Processors expands to 128 cores with Altra Max™. They do workstations as well.

    Maybe notebooks..

Log in

Don't have an account? Sign up now