It’s been nearly 10 years since Arm had first announced the Armv8 architecture in October 2011, and it’s been a quite eventful decade of computing as the instruction set architecture saw increased adoption through the mobile space to the server space, and now starting to become common in the consumer devices market such as laptops and upcoming desktop machines. Throughout the years, Arm has evolved the ISA with various updates and extensions to the architecture, some important, some maybe glanced over easily.

Today, as part of Arm’s Vision Day event, the company is announcing the first details of the company’s new Armv9 architecture, setting the foundation for what Arm hopes to be the computing platform for the next 300 billion chips in the next decade.

The big question that readers will likely be asking themselves is what exactly differentiates Armv9 to Armv8 to warrant such a large jump in the ISA nomenclature. Truthfully, from a purely ISA standpoint, v9 probably isn’t an as fundamental jump as v8 was over v7, which had introduced a completely different execution mode and instruction set with AArch64, which had larger microarchitectural ramifications over AArch32 such as extended registers, 64-bit virtual address spaces and many more improvements.

Armv9 continues the usage of AArch64 as the baseline instruction set, however adds in a few very important extensions in its capabilities that warrants an increment in the architecture numbering, and probably allows Arm to also achieve a sort of software re-baselining of not only the new v9 features, but also the various v8 extensions we’ve seen released over the years.

The three new main pillars of Armv9 that Arm sees as the main goals of the new architecture are security, AI, and improved vector and DSP capabilities. Security is a very big topic for v9 and we’ll go into the new details of the new extensions and features into more depth in a bit, but getting DSP and AI features out of the way first should be straightforward.

Probably the biggest new feature that is promised with new Armv9 compatible CPUs that will be immediately visible to developers and users is the baselining of SVE2 as a successor to NEON.

Scalable Vector Extensions, or SVE, in its first implementation was announced back in 2016 and implemented for the first time in Fujitsu’s A64FX CPU cores, now powering the world’s #1 supercomputer Fukagu in Japan. The problem with SVE was that this first iteration of the new variable vector length SIMD instruction set was rather limited in scope, and aimed more at HPC workloads, missing many of the more versatile instructions which still were covered by NEON.

SVE2 was announced back in April 2019, and looked to solve this issue by complementing the new scalable SIMD instruction set with the needed instructions to serve more varied DSP-like workloads that currently still use NEON.

The benefit of SVE and SVE2 beyond addition various modern SIMD capabilities is in their variable vector size, ranging from 128b to 2048b, allowing variable 128b granularity of vectors, irrespective of what the actual hardware is running on. Purely from a view of vector processing and programming, it means that a software developer would only ever have to compile his code once, and if in the future a CPU would come out with say native 512b SIMD execution pipelines, the code would be able to already take advantage of the full width of the units. Similarly, the same code would be able to run on more conservative designs with a lower hardware execution width capability, which is important to Arm as they design CPUs from IoT, to mobile, to datacentres. It also does this all whilst remaining within the 32b encoding space of the Arm architecture, whereas alternative implementations such as on x86 have to add on new extensions and instructions depending on vector size.

Machine learning is also seen as an important part of Armv9 as Arm sees more and more ML workloads to become common place in the next years. Running ML workloads on dedicated accelerators naturally will still be a requirement for anything that is performance or power efficiency critical, however there still will be vast new adoption of smaller scope ML workloads that will run on CPUs.

Matrix multiplication instructions are key here and will represent an important step in seeing larger adoption across the ecosystem as being a baseline feature of v9 CPUs.

Generally, I see SVE2 as probably the most important factor that would warrant the jump to a v9 nomenclature as it’s a more definitive ISA feature that differentiates it from v8 CPUs in every-day usage, and that would warrant the software ecosystem to go and actually diverge from the existing v8 stack. That’s actually become quite a problem for Arm in the server space as the software ecosystem is still baselining software packages on v8.0, which unfortunately is missing the all-important v8.1 Large System Extensions.

Having the whole software ecosystem move forward and being able to assume new v9 hardware has the capability of the new architectural extensions would help push things ahead, and probably solve some of the current situation.

However v9 isn’t only about SVE2 and new instructions, it also has a very large focus on security, where we’ll be seeing some more radical changes.

Introducing the Confidential Compute Architecture
Comments Locked

74 Comments

View All Comments

  • melgross - Saturday, April 10, 2021 - link

    Yes, before ARM had even announced their 64 bit core was suitable for anything that the servers is was aimed at, Apple came out with their 2 core version in the A7, shocking the entire industry.

    I would be surprised if that use this in their A15 later this year.
  • ballsystemlord - Tuesday, March 30, 2021 - link

    They want to include ray-tracing?! Mobile phones, the biggest market I'm aware of for ARM GPUs, are not even able to afford to include the complete GPU+CPU+caches. They use too much area and power to work in that form factor.
    How on earth would they get ray-tracing in there too?
  • grant3 - Wednesday, March 31, 2021 - link

    My layman's understanding of the ARM ecosystem is they're not exclusively for use in mobile phones. And that licensees can design different processors, with different tradeoffs, to suit different purposes.

    So perhaps it's unlikely that someone will design an ARM chip with raytracing silicon for mobile phones any time soon.

    but it certainly seems plausible that sometime in the next 5-10 years, someone shall be interested in building a different kind of device with a -larger- form factor that has the thermal and power consumption envelope support a ray-tracing enabled ARM processor.
  • ballsystemlord - Wednesday, March 31, 2021 - link

    Good point. They could be just future proofing themselves and allocating some IP so that they can compete.
  • iphonebestgamephone - Thursday, April 1, 2021 - link

    Huawei had worked on raytracing on android, shown in some demos.

    https://www.reddit.com/r/Android/comments/eczftf/n...

    Its not like raytracing means it looks like whats shown using an rtx3080.
  • dicobalt - Wednesday, March 31, 2021 - link

    So long as the operating system running on the ARM chip is capable of updating itself. No ridiculous Android philosophy of placing this task in the hands of inept OEMs. We're gonna need a real OS like Windows, Linux, or even MacOS.
  • Findecanor - Friday, April 2, 2021 - link

    Don't attribute to ineptitude what can be adequately explained by malice. The OEMs want you to buy new hardware when your banking app no longer works.
  • Silver5urfer - Wednesday, March 31, 2021 - link

    Ah the good old x86 death threat comments, how long it has been since the last ? Anyways AI is not going to dethrone x86, everyone is going to buy the leader's chips - Nvidia or they will make their own, also Intel has FPGA and Xilinx has FPGA as well, a.k.a AMD. So they can build specialized cores whenever they feel it.

    Apple is not competing in the server space, so they cannot touch AMD and Intel volumes in x86, all they do is Consumer business, all their servers also run on x86 lmao. The ARM dominance over x86 doesn't exist, as per the Servermarketshare it doesn't come close, since over 95% it is x86, and AMD is now slowly taking away Intel's share of Xeon with EPYC series.

    So far no ARM processor beat EPYC Rome, next the AWS Graviton2 is excl. to Amazon, Microshaft rumors on building own chip will be exclusive, they want centralization of the power into their ecosystem because oil's age of power is over. Anyways, so what's left ? Google ? hah the incompetent and politically radicalist nature of them is utter stupid and their castration of Android is unforgettable. They are simply moving all AOSP into Google services turning it into another Apple walled garden, and their HW is pathetic, only agenda is dumbing down. So ARM works there because the phones can only run on ARM HW. Yes they outnumber desktop parts by a huge still the world relies on x86 computing, even if the SW is dumbed down (Win10 UWP etc.. Mac OS into phone hybrid OS, less power user features) there's massive market of Dell / HP / Lenovo / Supermicro / Gigabyte who all cater to x86 ONLY. So the hero ARM doesn't have an OEM lol, That latest 80C Altra Ampere ofc is available but it's weak vs AMD. Intel IceLake Xeon is coming as well, and fat stacks already went to the CTOs to get Intel HW only, Marvell Thunder ? Last time I heard they were going to build custom chips, Fujitsu A64FX ? custom. Oh I forgot, Nuvia, Qualcomm swallowed them so they are going to resurrect Centriq ? after how they axed all custom in house designs with it and pushing only ARM cores on Android.. I guess so.

    Finally what does ARM provide ? more custom bs where you cannot do anything since the OEM owns your HW top to bottom and cannot have good backwards compat because the SW is made for dumbed down users ? hint - Surface SQ2, to be honest even x86 Surface has highly locked down HW. Macboooks ? everything soldered down and locked down, what else consumers have to rave so hard about ARM, i suppose Raspberry Pi which is going to dethrone x86 (Pi is amazing HW not doubting at all but people have to realize what is it that ARM is providing to them over x86 in both HW and SW stack and user customization) Finally the Switch, it is huge in numbers and a new HW is on the horizon for the DLSS equipped HW Pro edition but is it comparable to the AMD SoC in Xbox and PS ? nope.

    But yeah x86 is going to die lmao.
  • viktorcode - Wednesday, March 31, 2021 - link

    I would love to archive this comment for posterity...
  • Wilco1 - Wednesday, March 31, 2021 - link

    "So far no ARM processor beat EPYC Rome, next the AWS Graviton2 is excl. to Amazon"

    If you had bothered to read the Milan review, you would know that Ampere Altra not only outperforms Rome by a good margin, but matches Milan as well (1% faster on 1S SPECINT_rate). All that with a 2-year old core and 1/8th of the cache... ~15% of AWS is now Graviton and still growing fast, so it is obviously displacing a huge amount of x86 servers.

Log in

Don't have an account? Sign up now