While Qualcomm has become wildly successful in the Arm SoC market for Android smartphones, their efforts to parlay that into success in other markets has eluded them so far. The company has produced several generations of chips for Windows-on-Arm laptops, and while each has incrementally improved on matters, it’s not been enough to dislodge a highly dominant Intel. And while the lack of success of Windows-on-Arm is far from solely being Qualcomm’s fault – there’s a lot to be said for the OS and software – silicon has certainly played a part. To make serious inroads on the market, it’s not enough to produce incrementally better chips – Qualcomm needs to make a major leap in performance.

Now, after nearly three years of hard work, Qualcomm is getting ready to do just that. This morning, the company is previewing their upcoming Snapdragon X Elite SoC, their next-generation Arm SoC designed for Windows devices. Based on a brand-new Arm CPU core design from their Nuvia subsidiary dubbed “Oryon”, the Snapdragon X Elite is to be the tip of the iceberg for a new generation of Qualcom SoC designs. Not only is it the heart and soul of Qualcomm’s most important Windows-on-Arm SoC to date, but it will eventually be in smartphones and a whole lot more.

But we’re getting ahead of ourselves. For now let’s focus on the Snapdragon X Elite SoC and the Oryon cores underpinning it.

While this morning’s announcement from Qualcomm is far from a deep dive on the hardware, it’s our first look at what will be Qualcomm’s flagship SoC, and the new CPU cores within it. With a projected launch date of mid-2024, the first laptops based on the SoC are still several months away from hitting retail shelves – and about a year delayed overall. None the less, Qualcomm has finished their silicon development work, and with the chip’s specifications locked down, the company is now on to polishing things for a launch next year.

The Oryon CPU cores within the Snapdragon X Elite are the culmination of Qualcomm’s Nuvia acquisition from early 2021, and an even longer period of work for the Nuvia team. The ambition of the team, and the importance of the custom Arm architecture CPU cores, cannot be overstated. So the Snapdragon X Elite is going to be an interesting chip on multiple levels, as it sets the pace for the next generation of Qualcomm chip designs.

Snapdragon Compute (Windows-on-Arm) Silicon
AnandTech Snapdragon X Elite Snapdragon 8cx
Gen 3
Snapdragon 8cx
Gen 2
Snapdragon 8cx
Gen 1
Prime Cores 12x Oryon
3.80 GHz

2C Turbo: 4.3GHz
4x C-X1
3.00 GHz
4 x C-A76
3.15 GHz
4 x C-A76
2.84 GHz
Efficiency Cores N/A 4x C-A78
2.40 GHz
4 x C-A55
1.80 GHz
4 x C-A55
1.80 GHz
GPU Adreno
SD X Elite
4.6 TFLOPS
Adreno
8cx Gen 3
Adreno 690 Adreno 680
NPU Hexagon
45 TOPS (INT8)
Hexagon 8cx Gen 3
15 TOPS
Hexagon 690
9 TOPS
Hexagon 690
9 TOPS
Memory 8 x 16-bit
LPDDR5x-8533
136GB/sec
8 x 16-bit
LPDDR4x-4266
86.3 GB/sec
8 x 16-bit
LPDDR4x-4266
86.3 GB/sec
8 x 16-bit
LPDDR4x-4266
86.3 GB.sec
Wi-Fi Wi-FI 7 + BE 5.4
(Discrete)
Wi-Fi 6E + BT 5.1 Wi-Fi 6 + BT 5.1 Wi-Fi 5 + BT 5.0
Modem Snapdragon X65
(Discrete)
Snapdragon X55/X62/X65
(Discrete)
Snapdragon X55/X24
(Discrete)
Snapdragon X24
(Discrete)
Process 4nm Samsung 5LPE TSMC N7 TSMC N7

Starting with a high-level look at the chip, the Snapdragon X Elite is a high-performance SoC designed to power Windows-on-Arm laptops. Qualcomm isn’t listing any official TDPs, but the company has told us that the Elite is designed to scale across a “broad range” of thermal designs. Active cooling will be needed to get the most out of the Elite, but according to Qualcomm, passive/fanless designs are possible as well, and we should expect to see some retail devices designed as such.

Qualcomm is fabbing the chip on an unspecified 4nm process. Given their previous performance issues with Samsung’s 4nm line, it’s a very safe bet that they’re building this chip at TSMC – possibly using the N4P line. The silicon itself is a traditional monolithic die, so there is no use of chiplets or other advanced packaging here (though the wireless radios are discrete).

CPU: Oryon By The Dozen

The star of the show (if you’ll forgive the pun) is Oryon, Qualcomm’s new custom-designed Arm CPU core. Designed by the Nuvia team that Qualcomm acquired in 2021, Oryon is the first high-performance, fully-custom Arm CPU core created by Qualcomm in several years. And following multiple generations of lackluster Snapdragon Compute SoCs built out of Arm Cortex-A/X designs and functionally bigger versions of Qualcomm’s mobile SoCs, Oryon marks a major change in direction for Qualcomm.

Being that this is a preview, there are no significant architectural details to share on Oryon at this time. We don’t know the width, or various buffer sizes, execution ports, etc. But what we do know is that Qualcomm didn’t aim low with this SoC – the Nuvia team was working on a server-grade CPU core prior to their acquisition, and that kind of aggressive design has carried over into Oryon as well. Which, after all, was one of the major goals of Qualcomm’s acquisition, as they have desired a high performance CPU core to push them ahead of the other laptop (and eventually mobile) chip makers.

The Snapdragon X Elite SoC ships with 12 Oryon CPU cores – and that’s it. Unlike Qualcomm’s 8cx family of designs, there are no distinct “efficiency” and “performance” cores based on different microarchitectures; this is a homogenous CPU design, more akin to traditional PC processors. This means that Oryon needs to pull double duty, excelling in performance in heavy workloads without chewing up a bunch of power in light workloads.

The Oryon CPU cores are broken up into three clusters of 4 cores each. We’re still waiting on further technical details, of course, but it’s a safe assumption that each cluster is on its own power rail, so that unneeded clusters can be powered down when only a handful of cores are called for.

Just on this basis alone, Snapdragon X Elite looks like a far more potent performer than the 8CX chips it replaces. The 8cx Gen 3 offered just 4 performance cores (Cortex-X1) and another 4 eficiency cores (Cortex-A78), so Snapdragon X Elite will hit the streets with 50% more CPU cores never mind the higher performance of those cores. For a laptop chip, Qualcomm is throwing a lot of CPU cores at the matter.

With regards to clockspeeds, in an all-core turbo workload, all 12 Oryon CPU cores can hit run at up to 3.8GHz, power and thermal headroom permitting. Meanwhile in lighter workloads, the chip supports turboing up to 4.3GHz on 2 cores. Qualcomm’s slide on this matter shows a core from each cluster, but it’s unclear whether this is some kind of prime/favored core in action (where only certain cores are designed/validated for those speeds) or if it’s simply a stylistic choice.

Either way, Qualcomm is aiming to turbo to relatively high clockspeeds for their laptop chip, a notable distinction from their much more modestly clocked 8CX chips. While high clockspeeds alone do not make for a fast chip, one of the performance bottlenecks the 8CX chips were their pokey clockspeeds, so if Oryon offers as high an IPC rate as we suspect it will, then this would go a long way towards boosting Qualcomm’s CPU performance to compete with the industry’s strongest players.

Memory: 128-bit LPDDR5x

Feeding the beastly Oryon CPU cores (as well as the rest of the chip) is a 128-bit LPDDR5x memory bus. This is less remarkable than the CPU side of the chip, but it’s important to note all the same. With the previous 8CX chips only supporting LPDDR4x, this brings Qualcomm back to parity with the latest PC chips in terms of memory technology support. And with supported data rates as high as LPDDR5x-8533, this will give Qualcomm one of the fastest memory controllers on the market.

Qualcomm is also quoting a total of 42MB of cache in the system sitting between the various processor blocks and system memory. Given the explicit mention of “total cache”, this is almost certainly L2 + L3. Previous Qualcomm designs have offered a 6MB shared L3 (last level) cache. If that’s the case again here, then that would mean there’s 3MB of L2 cache available for each CPU core – or some permutation thereof.

GPU: Latest Generation Adreno

On the graphics side of matters, Snapdragon X Elite incorporates Qualcomm’s latest generation Adreno GPU. As is typical for Qualcomm in these matters, the company is saying virtually nothing about the architecture employed here, though it goes without saying that this is the latest and greatest iteration of Qualcomm’s in-house GPU design.

From a feature perspective, this is a DirectX 12-class GPU with ray tracing support, mirroring the capabilities Qualcomm introduced with last year’s Snapdragon 8 Gen 2 mobile SoC. Within the Windows ecosystem, it will almost certainly qualify as a DirectX 12 Ultimate (feature level 12_2) design.

Qualcomm is quoting a single throughput figure for the design: 4.6 TFLOPS at an unspecified bit depth/format (we’d guess FP32). Qualcomm has not previously disclosed similar figures for the 8CX chips, so it’s hard to say how this will compare. Or even how it will compare to other integrated GPUs, since there’s a lot more to real-world GPU performance than pure FLOPS.

The display controller portion of the GPU offers support for up to 4 DisplayPort displays. Besides an internal display for the laptop, it can drive a further 3 external displays (all DP 1.4), with one output being 5K capable, while the rest are 4K.

Finally, the SoC is getting Qualcomm’s latest video processing block (VPU) as well. This latest design not only support AV1 decoding, but in a first for a Qualcomm SoC, AV1 encoding as well.

NPU: Hitting Hard with Hexagon

Next to the use of Oryon CPU cores, Qualcomm’s other big bet with the Snapdragon X Elite SoC is on the AI/neural processing unit side of things with their latest generation Hexagon NPU. Qualcomm is expecting that AI use will continue to rapidly grow over the next few years, and that the next big push is going to be AI models running locally on users’ systems. So they have invested a significant amount of resources in bulking up their Hexagon NPU for this generation of chips (X Elite and 8 Gen 3).

The end result is a heavily revised NPU, which should greatly exceed the 8CX Gen 3’s NPU performance. Qualcomm is quoting 45 TOPS of performance here for modest precision INT8, whereas 8CX Gen 3 was previously quoted at 15 TOPS for an unspecified data format.

Unlike their CPU and GPU, Qualcomm is sharing some architectural details here about the NPU, and what they’ve done to boost its performance. The tensor accelerator block, used in the densest matrix math, is outright 2.5x faster than before. Backing that (and the rest of the NPU) is a 2x larger shared memory/cache (though Qualcomm is not disclosing the actual size). Qualcomm is targeting large language models (LLMs) in particular with this change, as these are notoriously memory bound; according to the company, the chip will have enough resources to run a 13 billion parameter Llama 2 model locally.

Qualcomm has also made some power delivery changes to help drive more performance/efficiency out of the NPU. The power-hungry tensor block is now on its own power rail, with the rest of the NPU sitting on a separate shared rail. The company has also made some further undisclosed improvements to how they handle micro-tiling of inferencing workloads, which directly impacts how well they can split up workloads to keep the various sub-blocks of the NPU as busy as possible while minimizing intermediate memory operations.

I/O: USB4, PCIe 4, & Discrete Wi-Fi 7

Rounding out the Snapdragon X Elite, let’s talk I/O.

For internal I/O, the SoC offers PCIe 4.0 connectivity for NVMe storage. Elsewhere, the company is using PCIe 3 to supply connectivity to their modem and Wi-Fi solutions. No mention has been made of whether there are any free PCIe lanes for further peripherals.

For external I/O, the SoC supports USB4. According to Qualcomm, it can drive up to 3 such Type-C ports, and there are also a pair of USB 3.2 Gen2 outputs, and a single USB 2.0 output for internal use.

As noted earlier, both Wi-Fi and the modem are discrete for this product. The chip is intended to be paired with Qualcomm’s FastConnect 7800 silicon in the form of an M.2 card. The 7800 is their latest-generation Wi-Fi 7 solution, with support for 4 spatial streams as well as Bluetooth 5.4. The modem pairing is the Snapdragon X65, a high-performance 5G modem which was also available for the 8CX Gen 3.

The fact that neither wireless system is integrated into the SoC is unusual for Qualcomm, but perhaps not too surprising since they want to bring the Elite to market ASAP. Integrating these modules would take further time, and as a laptop SoC, Qualcomm doesn’t need to be as space efficient. In any case, the official line from Qualcomm is that the discrete modem is for OEM flexibility – to give OEMs the option to either include a modem or not – though Qualcomm of course will be strongly encouraging OEMs to include one as a major feature differentiator of the platform.

Performance Claims

As we don’t have enough architectural details to make any meaningful performance projections, the best thing we have for now are Qualcomm’s vague comparisons to their competitors. This is also the closest thing Qualcomm has provided to energy efficiency data for the chip (though, as always, target clockspeeds for a SKU play a massive part there).

With 12 performance cores, Qualcomm is pushing hard on multi-threaded performance. In fact, multi-threaded performance is the only CPU performance comparisons Qualcomm makes, as there are no single-threaded comparisons to speak of. Make of that what you will.

Against what is implied to be an Intel 12 core mobile CPU design, Qualcomm is reporting that Snapdragon X Elite delivers 2x the multi-threaded performance in Geekbench 6. Or at iso-performance, they hit the same mark at one-third the power consumption.

Even against Intel’s best 14-core (H-class) chips, Qualcomm still reports that they lead by 60% in performance, and again are consuming one-third the power at iso-performance. Undoubtedly, a lot of this is down to the process node used, as TSMC N4 should be delivering a significant advantage over the Intel 7 process used on Intel’s current chips. This is also why the “moving target” aspect is so critical, as Snapdragon X Elite should be competing with the Intel 4 based Meteor Lake lineup by the time it launches next year.

More interesting, perhaps, is that Qualcomm is reporting a 50% multi-threaded performance advantage over an unspecified "Arm-based competitor,” This is meant to imply Apple, but depending on just how vague Qualcomm wishes to be, MediaTek does offer some Windows-on-Arm chips as well.

Qualcomm also expects to lead in GPU performance in 3DMark Wildlife Extreme. Which again, with a process node advantage and a tendency to build bigger iGPUs overall, is not surprising.

As always, these claims should be taken with a large grain of salt, especially for a platform that is still several months away from launching.

Snapdragon X Elite: Coming Mid-2024

Wrapping things up, Qualcomm is at this point putting the final touches on the Snapdragon X Elite. The company has deemed it one of their “most pivotal platform announcements in the company's recent history”, and for good reason. The Oryon CPU core being introduced here will eventually be at the heart of a good deal more products, so how competitive Oryon is will make or break Qualcomm’s next few generations of designs.

Devices based on the Snapdragon X Elite should be available in mid-2024. Which on that schedule, should see the Snapdragon X Elite competing against Intel’s Meteor Lake (Core Ultra) chips, AMD’s Phoenix chips (Ryzen Mobile 7000), and whatever the latest available iteration is of Apple’s M-series chips.

Comments Locked

84 Comments

View All Comments

  • techconc - Thursday, October 26, 2023 - link

    You do know that you can dual boot an ARM based MacBook in Asahi Linux, right?
  • abufrejoval - Wednesday, October 25, 2023 - link

    On the single peak clock per cluster issue:

    I can see that being a technical constraint/optimization: due to the exponential power increase near the top of the CMOS-curve, the power delivery to any cluster may just be limited to a certain amount per cluster. It points toward very decoupled power delivery per cluster and then having to support peak clock rates on neighbouring cores doesn't just require accomodating clocks, but also heat. Two cores running at 4.3 side-by-side or in adjacent clusters just aren't quite the same up close.

    As tiny as these chips are, spreading it across the clusters may help reducing the amount of dark silicon otherwise required to sustain that and seems very much in line with what those Nuvia/Apple engineers have been doing all along.

    Then again the difference between 3.8 and 4.3 GHz is so low that I can't work up the least bit of outrage at these numbers.

    But I am wary we'll actually see 3.8GHz on all cores at ultrabook power ranges of 10-15 Watts.

    There is much less left in terms of clocks on my Ryzen 5800U notebook, once all 8 cores spring into sustained action and for all 4nm and Nuvia magic, physics are still hard to beat.

    But 12 cores at 3.8 on a NUCalike running a mix of Linux and Windows on a hypervisor with iGPU pass-through sounds really attractive...

    So let's make sure M$ keeps their greedy paws from locking down a market segment they are crazy enough to consider their own.
  • sjkpublic@gmail.com - Wednesday, October 25, 2023 - link

    Memory, NPU and GPU nice. No M.2? No hyperthreading? No power saving. Compare this to the latest AMD SOC?
  • techconc - Thursday, October 26, 2023 - link

    I think you're just naming marketing material without understanding what it is. Hyperthreading doesn't make sense on an ARM chip with a reasonably wide decoder. The short version is this, on Intel based designs you only have something like a 4-wide decoder whereas on Apple's latest A17 Pro it's up to a 9-wide decoder. This is because the ARM (RISC) instruction set is much more predictable in terms of length, etc. x86/x64 still has CISC based instructions are are less predictable in length. Hyperthreading is effectively just a hack to recapture some of the lost efficiency that comes with a poor decoder.
  • mode_13h - Sunday, October 29, 2023 - link

    SMT is about much more than simply circumventing decoder bottlenecks.

    See also: GPUs
  • techconc - Monday, October 30, 2023 - link

    No, SMT is not more than a hack to make up for a poor decoder. It’s also proven to be a security liability in many implementations as well. Don’t get me wrong, it’s a clever hack and an example of Intel pushing a legacy ISA far beyond what should be possible. But, at the end of the day, it’s a hack and not needed modern chip/ISA designs.
  • tipoo - Friday, October 27, 2023 - link

    You don't need hyperthreading if you can fill the execution hardware without it, it's a band-aid for that. There was that Chinese designed Kirin SoC that just came out after the export bans with hyperthreading because they reused server cores, but people found it raised power use more than performance.
  • satai - Wednesday, October 25, 2023 - link

    Does it Linux?
  • yankeeDDL - Wednesday, October 25, 2023 - link

    The efficiency looks very good.
    Assuming that The Core 13900 uses ~2.5X the peak power as Ryzen 7940, on CPU side, the SD should be on-pair with Ryzen, which is not too bad.

    And the GPU looks good too, although it's easy to cherry-pick one parameter that can make it look better than the competition. Apples to apples comparison on GPU is difficult (more than on CPUs). But it's seems a massive step forward from The Gen2. Catching up with Apple, finally.
    I'd want something like this in my phone (I know, this is for laptops - hopefully, something similar trickles to phones).
  • shabby - Wednesday, October 25, 2023 - link

    Lol at those graphs, I'm ready to be disappointed...

Log in

Don't have an account? Sign up now