OoOE

You’re going to come across the phrase out-of-order execution (OoOE) a lot here, so let’s go through a quick refresher on what that is and why it matters.

At a high level, the role of a CPU is to read instructions from whatever program it’s running, determine what they’re telling the machine to do, execute them and write the result back out to memory.

The program counter within a CPU points to the address in memory of the next instruction to be executed. The CPU’s fetch logic grabs instructions in order. Those instructions are decoded into an internally understood format (a single architectural instruction sometimes decodes into multiple smaller instructions). Once decoded, all necessary operands are fetched from memory (if they’re not already in local registers) and the combination of instruction + operands are issued for execution. The results are committed to memory (registers/cache/DRAM) and it’s on to the next one.

In-order architectures complete this pipeline in order, from start to finish. The obvious problem is that many steps within the pipeline are dependent on having the right operands immediately available. For a number of reasons, this isn’t always possible. Operands could depend on other earlier instructions that may not have finished executing, or they might be located in main memory - hundreds of cycles away from the CPU. In these cases, a bubble is inserted into the processor’s pipeline and the machine’s overall efficiency drops as no work is being done until those operands are available.

Out-of-order architectures attempt to fix this problem by allowing independent instructions to execute ahead of others that are stalled waiting for data. In both cases instructions are fetched and retired in-order, but in an OoO architecture instructions can be executed out-of-order to improve overall utilization of execution resources.

The move to an OoO paradigm generally comes with penalties to die area and power consumption, which is one reason the earliest mobile CPU architectures were in-order designs. The ARM11, ARM’s Cortex A8, Intel’s original Atom (Bonnell) and Qualcomm’s Scorpion core were all in-order. As performance demands continued to go up and with new, smaller/lower power transistors, all of the players here started introducing OoO variants of their architectures. Although often referred to as out of order designs, ARM’s Cortex A9 and Qualcomm’s Krait 200/300 are mildly OoO compared to Cortex A15. Intel’s Silvermont joins the ranks of the Cortex A15 as a fully out of order design by modern day standards. The move to OoO alone should be good for around a 30% increase in single threaded performance vs. Bonnell.

Pipeline

Silvermont changes the Atom pipeline slightly. Bonnell featured a 16 stage in-order pipeline. One side effect to the design was that all operations, including those that didn’t have cache accesses (e.g. operations whose operands were in registers), had to go through three data cache access stages even though nothing happened during those stages. In going out-of-order, Silvermont allows instructions to bypass those stages if they don’t need data from memory, effectively shortening the mispredict penalty from 13 stages down to 10. The integer pipeline depth now varies depending on the type of instruction, but you’re looking at a range of 14 - 17 stages.

Branch prediction improves tremendously with Silvermont, a staple of any progressive microprocessor architecture. Silvermont takes the gshare branch predictor of Bonnell and significantly increased the size of all associated data structures. Silvermont also added an indirect branch predictor. The combination of the larger predictors and the new indirect predictor should increase branch prediction accuracy.

Couple better branch prediction with a lower mispredict latency and you’re talking about another 5 - 10% increase in IPC over Bonnell.

Introduction & 22nm Sensible Scaling: OoO Atom Remains Dual-Issue
Comments Locked

174 Comments

View All Comments

  • R0H1T - Tuesday, May 7, 2013 - link

    Let's see, umm Snapdragon 600 & then there's this soon to be released 800 ? So lemme get this straight, an unreleased product vs one that was available last year, Intel's latest(future indefinite) vs old/dated(relatively) from ARM seems fair to me !
  • ssiu - Monday, May 6, 2013 - link

    Exactly the 2 points I wonder about too:

    (1) GPU performance -- 1/4 of an HD4000, about iPad 4 level -- so slower than e.g. PowerVR Rogue which should come out around the same time

    (2) more importantly, even if Intel can make competitive/superior product, can it survive on such low margin?
  • zeo - Wednesday, May 8, 2013 - link

    Well, yes and no on point 1... The iPad is using a quad SGX544, and Rogue doesn't improve performance by that massive amount that a single Rogue/Series 6 could beat a quad Series 5. So it's not that Rogue will be better than the Bay Trail GMA but can scale higher with a multiple configuration!

    On the margins, Intel is lowering their costs moving to 22nm FAB and despite the declining PC market they're still doing well and so should be fine for the foreseeable future... They'll have to do terribly in all markets to really start hurting now and that's not likely yet...
  • andrewaggb - Monday, May 6, 2013 - link

    too early to say I think. This atom should be pretty good. if it's both twice as fast as the old atom and uses less power (which I believe is what they are trying to tell us), that's pretty good. It will be competing with 2nd gen a-15 designs or better, so the current performance claims are largely meaningless. GPU performance continues to be an issue, aiming for last years performance is definitely way too low. Fortunately gpu speed can normally be scaled more quickly than cpu speed, but intel seems to consistently underspec on gpu so I doubt they'll do better this time. Unless they go haswell style and have various different gpu skus. guess we'll see.

    Considering how much success rambus has had suing everybody I think if intel wanted to they could probably sue anybody working on advanced processor designs without sufficient licensing arrangements. Drive the minimum cost up a bit so the margins are higher.
  • R0H1T - Tuesday, May 7, 2013 - link

    This comment is hilarious ~ "gpu speed can normally be scaled more quickly than cpu speed" that's only if you're packing moar cores i.e. like SNB<IVB<<Haswell !

    GPU's cannot be scaled for performance unless there's some major redesigns of the underlying architecture, like AMD's transition to GCN, so unless you've got some insider info into how Intel plans to use their superior Iris(Pro) graphics in Silvermont I see this myth, about Intel's superior graphics, of yours being busted yet again, only this time in the mobile arena !
  • ominobianco - Monday, May 6, 2013 - link

    If you had actually read the article you would know that they are comparing against performance PROJECTIONS of competitors parts available at product launch time, NOT current parts.
  • zeo - Wednesday, May 8, 2013 - link

    Sorry but ARMv8 64bit aren't coming out till the later half of 2014 at the earliest and they're pushing to be on 16nm and not 20nm, which may delay them further!

    While there's no major improvements planned for ARM until then! Many of the original Cortex A15 SoC releases have been delayed from 2012 to 2013!
  • MrSpadge - Monday, May 6, 2013 - link

    Error: On page 1 you correctly write "Remember that power scales with the square of voltage". Almost immediately followed by "At 1V, Intel’s 22nm process gives ... or at the same performance Intel can run the transistors at 0.8V - a 20% power savings."
    Ouch - forgot that square!
  • dusk007 - Monday, May 6, 2013 - link

    I thought we would wait for 14nm for Intel to definitely pull ahead. This looks very promising.
    Now my perfect smartphone would sport a dual core Silvermont with a 4000mah battery, the HTC One camera and otherwise durable.
    GPU I don't care as long as it is good enough for the GUI I don't play games that would require something fast. Thin? Not at the cost of a smaller battery.
    I would love some feature phone like battery life. Triple what we have to deal with now would be incredible and possible it seems to me. Maybe the Motorola Phone X x86 Version can deliver that.
    Camera is secondary and I don't need a 1080p screen. Just 4.3-4.5" of 720p and long battery life.

    I feel like battery life is where this new generation can really promise new things. 32nm Atom already does really well in the tablets compared to quad core ARM competition. It will be a waste if they add 1500mah batteries though. I hope they finally realize as smartphones are mainstream that a lot of people would care first about battery life and second about 7mm thinness.
  • beginner99 - Tuesday, May 7, 2013 - link

    Agree. Current phones are too big, 1080p is pretty much useless and wastes battery life and even the GPU in Medfield is good enough for the GUI. The lower screen resolution of course helps too with needing a not so good GPU. But with both you save on power. I want a phone I need to charge once a week not every day.

Log in

Don't have an account? Sign up now