After Swift Comes Cyclone Oscar

I was fortunate enough to receive a tip last time that pointed me at some LLVM documentation calling out Apple’s Swift core by name. Scrubbing through those same docs, it seems like my leak has been plugged. Fortunately I came across a unique string looking at the iPhone 5s while it booted:

I can’t find any other references to Oscar online, in LLVM documentation or anywhere else of value. I also didn’t see Oscar references on prior iPhones, only on the 5s. I’d heard that this new core wasn’t called Swift, referencing just how different it was. Obviously Apple isn’t going to tell me what it’s called, so I’m going with Oscar unless someone tells me otherwise.

Oscar is a CPU core inside M7, Cyclone is the name of the Swift replacement.

Cyclone likely resembles a beefier Swift core (or at least Swift inspired) than a new design from the ground up. That means we’re likely talking about a 3-wide front end, and somewhere in the 5 - 7 range of execution ports. The design is likely also capable of out-of-order execution, given the performance levels we’ve been seeing.

Cyclone is a 64-bit ARMv8 core and not some Apple designed ISA. Cyclone manages to not only beat all other smartphone makers to ARMv8 but also key ARM server partners. I’ll talk about the whole 64-bit aspect of this next, but needless to say, this is a big deal.

The move to ARMv8 comes with some of its own performance enhancements. More registers, a cleaner ISA, improved SIMD extensions/performance as well as cryptographic acceleration are all on the menu for the new core.

Pipeline depth likely remains similar (maybe slightly longer) as frequencies haven’t gone up at all (1.3GHz). The A7 doesn’t feature support for any thermal driven CPU (or GPU) frequency boost.

The most visible change to Apple’s first ARMv8 core is a doubling of the L1 cache size: from 32KB/32KB (instruction/data) to 64KB/64KB. Along with this larger L1 cache comes an increase in access latency (from 2 clocks to 3 clocks from what I can tell), but the increase in hit rate likely makes up for the added latency. Such large L1 caches are quite common with AMD architectures, but unheard of in ultra mobile cores. A larger L1 cache will do a good job keeping the machine fed, implying a larger/more capable core.

The L2 cache remains unchanged in size at 1MB shared between both CPU cores. L2 access latency is improved tremendously with the new architecture. In some cases I measured L2 latency 1/2 that of what I saw with Swift.

The A7’s memory controller sees big improvements as well. I measured 20% lower main memory latency on the A7 compared to the A6. Branch prediction and memory prefetchers are both significantly better on the A7.

I noticed large increases in peak memory bandwidth on top of all of this. I used a combination of custom tools as well as publicly available benchmarks to confirm all of this. A quick look at Geekbench 3 (prior to the ARMv8 patch) gives a conservative estimate of memory bandwidth improvements:

Geekbench 3.0.0 Memory Bandwidth Comparison (1 thread)
  Stream Copy Stream Scale Stream Add Stream Triad
Apple A7 1.3GHz 5.24 GB/s 5.21 GB/s 5.74 GB/s 5.71 GB/s
Apple A6 1.3GHz 4.93 GB/s 3.77 GB/s 3.63 GB/s 3.62 GB/s
A7 Advantage 6% 38% 58% 57%

We see anywhere from a 6% improvement in memory bandwidth to nearly 60% running the same Stream code. I’m not entirely sure how Geekbench implemented Stream and whether or not we’re actually testing other execution paths in addition to (or instead of) memory bandwidth. One custom piece of code I used to measure memory bandwidth showed nearly a 2x increase in peak bandwidth. That may be overstating things a bit, but needless to say this new architecture has a vastly improved cache and memory interface.

Looking at low level Geekbench 3 results (again, prior to the ARMv8 patch), we get a good feel for just how much the CPU cores have improved.

Geekbench 3.0.0 Compute Performance
  Integer (ST) Integer (MT) FP (ST) FP (MT)
Apple A7 1.3GHz 1065 2095 983 1955
Apple A6 1.3GHz 750 1472 588 1165
A7 Advantage 42% 42% 67% 67%

Integer performance is up 44% on average, while floating point performance is up by 67%. Again this is without 64-bit or any other enhancements that go along with ARMv8. Memory bandwidth improves by 35% across all Geekbench tests. I confirmed with Apple that the A7 has a 64-bit wide memory interface, and we're likely talking about LPDDR3 memory this time around so there's probably some frequency uplift there as well.

The result is something Apple refers to as desktop-class CPU performance. I’ll get to evaluating those claims in a moment, but first, let’s talk about the other big part of the A7 story: the move to a 64-bit ISA.

A7 SoC Explained The Move to 64-bit
Comments Locked

464 Comments

View All Comments

  • flyingpants1 - Wednesday, September 18, 2013 - link

    Sorry, no. As much as I hate Apple products.. Apple is killing it with amazing battery life, some of the best IPC, possibly best thermals (and therefore lowest profile) and the BEST performance, all at the same time.

    Whatever you said about testing multiple phones doesn't matter. It doesn't change facts.
  • melgross - Wednesday, September 18, 2013 - link

    The Moto X hardly competes at all. What he said is true. We're talking an SoC that runs at a lower speed, often considerably slower. What Apple comes out with new iPads in October, they will increase clock speed by 15% or so, and often with more GPU cores as well, usually by 50%. We'll see what the chip can really do then. To be compared to tablets, and come ahead is pretty damning to other processor manufacturers.
  • jeffkibuule - Wednesday, September 18, 2013 - link

    When you design both hardware and software, you get optimize far more that when an OEM gets code from Google and *attempts* to get it working best on their hardware. And lets be clear, Android OEMs are clearly throwing more hardware at a software problem, when even a Windows phone with a 1.5 year old SoC in the Lumia 1020 feels far smoother and fluid that the latest Android phone.
  • ananduser - Wednesday, September 18, 2013 - link

    Simple...in the current smartphone game of leapfrog, the last one to release its flagship is the top dog. This time the A7 sports a brand new, benchmark "friendly", ARM designed ISA. Next time Exynos will have the upper hand, or Qualcomm, or nVidia.

    The GPU is made by Imagination not by Apple. The other manufacturers are trying to push their own solution rather than making due with a 3rd party. Technically any other manufacturer could pay Imagination for their Rogue chipset.

    Good for you, really.
  • akdj - Wednesday, October 9, 2013 - link

    "This time the A7 sports a brand new, benchmark "friendly", ARM designed ISA. Next time Exynos will have the upper hand, or Qualcomm, or nVidia."
    You're right---of course, as technology progresses----but the funny thing is, even the 'year old' iPhone 5 holds it's own against even the latest flagship Android devices when it comes down to graphics and browsing/energy (battery life). Apple's NOT just buying off the shelf parts from Imagination----they've hired many a chip expert/designer in house and are now not only optimizing their S/W code to the chip---they're also designing the chips architecture to their system with Imagination and in house skill sets. No other company is doing this yet...and with the new A8 instruction set---this is the best I've seen in 's' updates since the release of the 3GS.
    It's a strong move by Apple, these past few years---to apply their own instruction and low level programming as well as SoC design, match up to iOS...the software integrate(ability) with the hardware....and perhaps most importantly, their relationship with carriers (good or terrible, it's irrelevant) as they're huge sellers and contract 'getters' --- and the inability for the carriers to add their own skin has been brilliant. That, to me...and IMHO was one of the great 'feats' Steve Jobs pulled off. Obviously it only initially worked with AT&T----but Verizon almost immediately was beggin to get on board and now they've not only managed to penetrate small mom n pop services in the US---but open themselves up in China, India and Japan to some of the largest carriers (and population served) in the world.
    To me, this IS the route Google should take---and reign in both the OEMs and the carriers, but why would they care? They're not hardware makers (for profit anyways)---they're miners, advertiser first aid kits----Data Miners. Your Information is their Money. Period.
    Regardless of who makes the SoC, as long as their continues to be bloated Java skins on top of Android, the 'experience' with Apple will be 'better' when it comes to fluency, updates, app selection and app development (Money Paid to developers), et al.
    J
  • Miserphillips - Tuesday, September 17, 2013 - link

    iOS 7 very heavily copied Windows Phone.
  • melgross - Wednesday, September 18, 2013 - link

    It's nothing at all like that laggard—thankfully.
  • nephipower - Tuesday, September 17, 2013 - link

    Can you share how much more space is used on the device because the native apps are now 64 bit?
  • solipsism - Tuesday, September 17, 2013 - link

    Like all additional binaries it's negligible. The times you'll notice extra space being used is when you need additional resources, like images for 1x and 2x display (i.e.: Retina) or making a Universal app that needs to support both iPhone and iPad UIs.
  • tech01x - Wednesday, September 18, 2013 - link

    Actually, no extra space for images. It's the pointers that are double in size, so depending on the application, there will be additional RAM usage.

Log in

Don't have an account? Sign up now