CPU ST Performance: Faster & More Efficient

Starting off with this year’s review of the A15, in order to have a deeper look at the CPU single-threaded performance and power efficiency, we’re migrating over to SPEC CPU 2017. While 2006 has served us well over the years and is still important and valid, 2017 is now better understood in terms of its microarchitectural aspects in its components, and becoming more relevant as we moved our desktop side coverage to the new suite some time ago.

One continuing issue with SPEC CPU 2017 is the Fortran subtests; due to a lacking compiler infrastructure both on iOS and Android, we’re skipping these components entirely for mobile devices. What this means also, is that the total aggregate scores presented here are not comparable to the full suite scores on other platforms, denoted by the (C/C++) subscript in the score descriptions.

As always, because we’re running completely custom harnesses and not submitting the scores officially to SPEC, we have to denote the results as “estimates”, although we have high confidence in the accuracy.

In terms of compiler settings, we’re continuing to employ simple -Ofast flags without further changes, to be able to get the best cross-platform comparisons possible. On the iOS side of things, we’re running on the newest XCode 13 build tools, while on Android we’re running the NDKr23 build tools.

In terms of performance and efficiency details, we’re swapping the graphs around a bit from now on – on the left axis we have the performance scores of the tests – larger bars here mean better performance. On the right-side axis, growing from right to left are the energy consumption figures for the platforms, the smaller the figure, the more energy efficient (less energy consumed) a workload was completed. Alongside the energy figure in Joules, we’re also showcasing the average power figure in Watts.

Starting off with the performance figures of the A15, we’re seeing increases across the board, with absolute performance going up from a low of 2.5% to a peak of +37%.

The lowest performance increases were found in 505.mcf_r, a more memory latency sensitive workload; given the increased L2 latency as well as slightly higher DRAM latency, it doesn’t come too unexpected to see a more minor performance increase. However, when looking at the power and efficiency metrics of the same workload, we see the A15 use up almost 900mW less than the A14, with energy efficiency improving by +22%. 520.omnetpp_r saw the biggest individual increase at +37% performance – power here went up a bit, but energy efficiency is also up 24%.

The smallest performance gains of the A15 are found in the most back-end execution bound workloads, 525.x264_r and 538.imagick_r improve by only 8.7%, resulting in an IPC increase of 0.6% - essentially within the realm of measurement noise. Still, even here in this worst performance case, Apple still managed to improve energy efficiency by +13%, as the new chip is using less absolute power even though clock frequencies have gone up.

The most power demanding workload, 519.lbm_r, is extremely bandwidth hungry and stresses the DRAM the most in the suite, with the A15 chip here eating a whopping 6.9W of power. Still, energy efficiency is generationally slightly improved as performance goes up by 17.9% - based on first teardown reports, the A15 is still only powered by LPDDR4X-class memory, so these improvements must be due to the chip’s new memory subsystem and new SLC.

Shifting things over to the efficiency cores, I wanted to make comparisons not only to the A14’s E-cores, but also put the Apple chips in context to the competition, a Snapdragon 888 in this context, comparing against a 2.41GHz Cortex-A78 mid-core, as well as a 1.8GHz Cortex-A55 little core.

The A15’s E-cores are extremely impressive when it comes to performance. The minimum improvement varies from +8.4 in the 531.deepsjeng_r, essentially flat up with clocks, to up to again +46% in 520.omnetpp_r, putting more evidence into some sort of large effective sparse memory access parallelism improvement for the chip.  The core has a median performance improvement of +23%, resulting in a median IPC increase of +11.6%. The cores here don’t showcase the same energy efficiency improvement as the new A15’s performance cores, as energy consumption is mostly flat due to performance increases coming at a cost of power increases, which are still very much low.

Compared to the Snapdragon 888, there’s quite a stark juxtaposition. First of all, Apple’s E-cores, although not quite as powerful as a middle core on Android SoCs, is still quite respectable and does somewhat come close to at least view them in a similar performance class. The comparison against the little Cortex-A55 cores is more absurd though, as the A15’s E-core is 3.5x faster on average, yet only consuming 32% more power, so energy efficiency is 60% better. Even for the middle cores, if we possibly were to down-clock them to match the A15’s E-core’s performance, the energy efficiency is multiple factors off what Apple is achieving.

In the overview graph, I’m also changing things a bit, and moving to bubble charts to better spatially represent the performance to energy efficiency positioning, as well as the performance to power positioning. In the energy axis graphs, which I personally find to be more representative of the comparative efficiency and resulting battery life experiences of the SoCs, we see the various SoCs at their peak CPU performance states versus the total energy consumed to complete the workloads. On the power axis graphs, we see the same data, only plotted against average power. Generally, I find distinction of efficiency here to be quite harder between the various data-points, however some readers have requested this view. The bubble size corresponds to the average power of the various CPUs, we’re measuring system active power, meaning total device workload power minus idle power, to compensate components such as the display.

Apple A15 performance cores are extremely impressive here – usually increases in performance always come with some sort of deficit in efficiency, or at least flat efficiency. Apple here instead has managed to reduce power whilst increasing performance, meaning energy efficiency is improved by 17% on the peak performance states versus the A14. If we had been able to measure both SoCs at the same performance level, this efficiency advantage of the A15 would grow even larger. In our initial coverage of Apple’s announcement, we theorised that the company might possibly invested into energy efficiency rather than performance increases this year, and I’m glad to see that seemingly this is exactly what has happened, explaining some of the more conservative (at least for Apple) performance improvements.

On an adjacent note, with a score of 7.28 in the integer suite, Apple’s A15 P-core is on equal footing with AMD’s Zen3-based Ryzen 5950X with a score of 7.29, and ahead of M1 with a score of 6.66.

The A15’s efficiency cores are also massively impressive – at peak performance, efficiency is flat, but they’re also +28% faster. Again, if we would be able to compare both SoCs at the same performance level, the efficiency advantage of the A15’s E-cores would be very obvious. The much better performance of the E-cores also massively helps avoiding the P-cores, further improving energy efficiency of the SoC.

Compared to the competition, the A15 isn’t +50 faster as Apple claims, but rather +62% faster. While Apple’s larger cores are more power hungry, they’re still a lot more energy efficient. Granted, we are seeing a process node disparity in favour of Apple. The performance and efficiency of the A15 E-cores also put to shame the rest of the pack. The extremely competent performance of the 4 efficiency cores alongside the leading performance of the 2 big cores explain the significantly better multi-threaded performance than the 1+3+4 setups of the competition.

Overall, the new A15 CPUs are substantial improvements, even though that’s not immediately noticeable to some. The efficiency gains are likely key to the new vastly longer battery longevity of the iPhone 13 series phones – more on that in a dedicated piece in a few days, and in our full device review.

The Apple A15 SoC: Focus on Efficiency GPU Performance - Great GPU, So-So Thermals Designs
Comments Locked


View All Comments

  • unclevagz - Monday, October 4, 2021 - link

    Thanks, since Anandtech does have data on Spec 2017 subtests with various x86 processors it may also be helpful to show these results for selected x86 CPUs in the displayed graphs for ease of comparisions.
  • Andrei Frumusanu - Monday, October 4, 2021 - link

    I thought about it but didn't want to complicate it too much given the power disparity.
  • Andrei Frumusanu - Monday, October 4, 2021 - link

    I added in performance marks for the x86 folks. Obviously no power data.
  • Kangal - Tuesday, October 5, 2021 - link

    Hey Andrei,
    The graphs for Spec-2017 Efficiency looks quiet off. It's showing the Cortex-A55 cores consuming considerable more energy than Apple's E-cores, and sometimes even more than the Cortex-A78 cores too. Whilst performance seems as expected.

    The worst offender seems to be the 544.nab_r, with the a discrepancy of 0.60 perf / 682 J = ~0.001 p/J compared to the 2.70 perf / 280 J = ~0.01 p/J. So that's an efficiency difference of ~x10 which is massive. And the best case for the A55 seems to be in the 541.leela_r test. Here we have 1.00 perf / 295 J = ~0.003 p/J compared to the 2.49 perf / 264 J = ~0.009 p/J. So in this best-case scenario the efficiency difference is ~x3 which is still huge.

    I mean, I remember when Apple's E-cores were running slightly slower than the Cortex-A73 whilst using slightly more power than the Cortex-A53. But what we have here is just ridiculous. We have even less power draw than the Cortex-A55 or even the Cortex-A53, but performance is somewhere above the legendary Cortex-A76.

    I can't wrap my head around it. It feels like an impossibility. Is my maths checking out? Or does there seem to be an issue someplace in the data?
  • Andrei Frumusanu - Tuesday, October 5, 2021 - link

    Perf per joule is a bit of a weird metric that is superfluous, you want either perf/W or simply just Joules consumed for energy efficiency, so either 0.60 / 0.24W = 2.5ppW & 2.7 / 0.45W = 6ppW. You can argue about power curves and ISO-perf or ISO-power.

    In any case, the other thing to consider is that we're not just measuring the core, we're measuring the efficiency of the whole SoC, power delivery, DRAM as well. Some vendors aren't running things as efficiently as they should be, that's how you end up with those Exynos A55 results, contrasted for example to the MediaTek A55 results.
  • Kangal - Wednesday, October 6, 2021 - link

    I didn't know that, I thought we had the software just churn out how much power the module was using on its own. With that said, I don't think it would be a factor. Apple doesn't have anything special in the makeup of their silicon to make it more efficient than competitors. And even if they did have a notable advantage in the make-up of their silicon, this would be against something like a RockChip SoC, and not against a flagship Qualcomm SoC. The more feasible explanation would be that the QSD chip might be activating other co-processors like it's NPU, and the task isn't actually being hardware-accelerated by it, but "software-encoded" by its targeted CPU (eg A55). Thus its still running slow, but now its wasting power by having other co-processors become active and not actually compute anything.
    .....Would something like this be a cause for concern, for future testing?

    Secondly, I used the Joules as that's what the graph was visually showing. I basically used it to find the best-case and worst-case scenario. I didn't really think hard about it. Since you've graphed it, and since you've recorded it, I figured you knew something that I didn't and prioritised Joules over Watts.

    Converting them to Watts, we instead get:
    (nab_r) 2.70/0.45 = 6.00 vs 2.50 = 0.6/0.24 ---> a difference of x2.4
    (leela_r) 2.49/0.40 = 6.23 vs 5.56 = 1.00/0.18 ----> a difference of x1.1

    But now, the graphs themselves need to be switched. For instance, the New Worst-case scenario is now: 520.omnetpp_r (~x3.4) from what I can see. Maybe I'll go through these benchmark figures properly on a weekend or something, unless you guys plan on doing something of the sort.

    So yes, these ranges do seem more reasonable. For starters, here we see the "IceStorm v2" cores are actually using about double the power of the "Cortex-A55" on half of the tests. This shatters my previous impression, that Apple's small cores were faster-than Cortex-A73 and used less-power than Cortex-A53. And that fits much neater into our general understanding about them, comparing small in-order cores, versus medium out-of-order cores.

    Can we change how the graphs are displayed from now on? Plot the Watts on the Right/Second x-axis instead of Joules. Or better yet, let's just strip out Joules entirely. I mean the third graph, the Energy-Axis should probably be deleted, and just keep the Power-Axis there instead? No?
  • Ppietra - Wednesday, October 6, 2021 - link

    Joules will always be the most correct parameter to assess efficiency, since it is the actual energy expended to do all the work.
    Power, on the other hand, can fluctuate through time while doing the work, so the Power value can be very deceiving, firstly because it might not be the actual average power usage, secondly because you need to do another calculation to actually measure efficiency.
  • Kangal - Wednesday, October 6, 2021 - link

    Do you know how they calculated the Watts? And how they calculated the Joules?

    To me, Watts makes much more sense in this context/comparison. Joules is more "universal" measurement, and it might be useful in a niche, but I feel like it could me mis-used/abused easily when put out of context.

    How do we explain the HUGE discrepancy in the measurements between Watts and Joules? There is something else here I am not understanding.
  • Ppietra - Wednesday, October 6, 2021 - link

    For that you need to understand what is Power and what is Energy.
    If there is one parameter that can be misused to assess efficiency while doing a task it’s Power not Energy. What you don’t seem to account for it’s the Time variable that affects how you can interpret Power.
  • michael2k - Wednesday, October 6, 2021 - link

    I wanted to specifically bring something up:
    Apple doesn't have anything special in the makeup of their silicon to make it more efficient than competitors.

    A14: TSMC 5nm (N5)
    A15: TSMC 5nm (N5P)
    D1200: TSMC 6nm (N6)
    SD888: TSMC 5nm (N5)

    Technically Apple is one year ahead of Qualcomm and two or so ahead of MediaTek in terms of process.

    Looking at the SPECin2017 Power Axis graph we see on average that the A15 IceStorm v2 consumes 0.44W/2349J to achieve a 2.42 score, which puts them on par with the D1200 A78 with it's 2.57 score, but at far higher power cost of 1.13W/6048J

    In other words the A78 and A15 have very similar performance, which makes sense since there are many similarities in terms of number of execution units, width, etc. If you look at the older style charts you can see that the efficiency cores were far closer in performance to the A76 'performance' cores on the Kirin 990:

    Long story short, there doesn't seem to be any surprises. Apple has a process advantage, uses cores similar to ARM's performance cores for efficiency purposes, and does so by clocking them at 3/4 the speed to dramatically reduce the power draw. The A15e only hits 2.016GHz and the A14e maxed at 1.823GHz, and the A13e at 1.728GHz

Log in

Don't have an account? Sign up now