OoOE

You’re going to come across the phrase out-of-order execution (OoOE) a lot here, so let’s go through a quick refresher on what that is and why it matters.

At a high level, the role of a CPU is to read instructions from whatever program it’s running, determine what they’re telling the machine to do, execute them and write the result back out to memory.

The program counter within a CPU points to the address in memory of the next instruction to be executed. The CPU’s fetch logic grabs instructions in order. Those instructions are decoded into an internally understood format (a single architectural instruction sometimes decodes into multiple smaller instructions). Once decoded, all necessary operands are fetched from memory (if they’re not already in local registers) and the combination of instruction + operands are issued for execution. The results are committed to memory (registers/cache/DRAM) and it’s on to the next one.

In-order architectures complete this pipeline in order, from start to finish. The obvious problem is that many steps within the pipeline are dependent on having the right operands immediately available. For a number of reasons, this isn’t always possible. Operands could depend on other earlier instructions that may not have finished executing, or they might be located in main memory - hundreds of cycles away from the CPU. In these cases, a bubble is inserted into the processor’s pipeline and the machine’s overall efficiency drops as no work is being done until those operands are available.

Out-of-order architectures attempt to fix this problem by allowing independent instructions to execute ahead of others that are stalled waiting for data. In both cases instructions are fetched and retired in-order, but in an OoO architecture instructions can be executed out-of-order to improve overall utilization of execution resources.

The move to an OoO paradigm generally comes with penalties to die area and power consumption, which is one reason the earliest mobile CPU architectures were in-order designs. The ARM11, ARM’s Cortex A8, Intel’s original Atom (Bonnell) and Qualcomm’s Scorpion core were all in-order. As performance demands continued to go up and with new, smaller/lower power transistors, all of the players here started introducing OoO variants of their architectures. Although often referred to as out of order designs, ARM’s Cortex A9 and Qualcomm’s Krait 200/300 are mildly OoO compared to Cortex A15. Intel’s Silvermont joins the ranks of the Cortex A15 as a fully out of order design by modern day standards. The move to OoO alone should be good for around a 30% increase in single threaded performance vs. Bonnell.

Pipeline

Silvermont changes the Atom pipeline slightly. Bonnell featured a 16 stage in-order pipeline. One side effect to the design was that all operations, including those that didn’t have cache accesses (e.g. operations whose operands were in registers), had to go through three data cache access stages even though nothing happened during those stages. In going out-of-order, Silvermont allows instructions to bypass those stages if they don’t need data from memory, effectively shortening the mispredict penalty from 13 stages down to 10. The integer pipeline depth now varies depending on the type of instruction, but you’re looking at a range of 14 - 17 stages.

Branch prediction improves tremendously with Silvermont, a staple of any progressive microprocessor architecture. Silvermont takes the gshare branch predictor of Bonnell and significantly increased the size of all associated data structures. Silvermont also added an indirect branch predictor. The combination of the larger predictors and the new indirect predictor should increase branch prediction accuracy.

Couple better branch prediction with a lower mispredict latency and you’re talking about another 5 - 10% increase in IPC over Bonnell.

Introduction & 22nm Sensible Scaling: OoO Atom Remains Dual-Issue
Comments Locked

174 Comments

View All Comments

  • althaz - Monday, May 6, 2013 - link

    I don't think you fully grasp the situation. Whilst Intel definitely can (and realistically should) take a strong leadership position in the mobile sector, companies like Qualcomm aren't going anywhere - Intel still won't (be able to?) compete on price, which means even if they take the lions-share of the market, there will be enough left for others to survive (they'll be a lot better off than AMD who sells more-expensive-to-manufacture chips for cheaper that perform worse and use more power).

    Although I wouldn't be too confident about nVidia, as they are yet to show they can compete with the likes of Qualcomm, let alone Intel.
  • R0H1T - Tuesday, May 7, 2013 - link

    They most certainly will not "take the lions-share of the market" because that belongs to the ultra thin margin chipmakers like Mediatek/Allwinner that deliver quad core ARM v7 based SoC in that 10~20$ range where Intel will not & cannot compete because of their relatively high(er) cost structure !
  • Khato - Tuesday, May 7, 2013 - link

    This is an argument that never makes sense to me. Yes, Intel won't go into a market unless the margins make it worthwhile... but do you not realize how cheap it is for Intel to make value processors on a deprecated node? Remember, Allwinner and Mediatek may operate on ultra thin margins, but that's in large part because the majority of the margins on their product go to the foundry they use. aka, when all the high end products are using Airmont cores Intel can keep making use of their 22nm capacity for awhile churning out 'old' Silvermont based products for the value market and simply get closer to the 'operating point' margin for that node.
  • R0H1T - Wednesday, May 8, 2013 - link

    I can't say how much TSMC charges for those chips but from what I know the single biggest cost of operations for Intel, outside of their R&D spending & foundry equipment upgrades, must be manpower & the difference between a Chinese/Taiwanese firm vs Intel in this particular dept would be a major one ! This is the real cost advantage that most smaller firms enjoy vis-a-vis Intel & for the foreseeable future they'll continue with this advantage.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    I didn't feel like this article is Intel PR crap. I read it all and I looked at all the improvements that are inbound; and I couldn't help but feel excited about Silvermont just like Anand.

    I cannot wait to see some benchmarks in the next few months.
  • Silma - Tuesday, May 7, 2013 - link

    On lack of AMD's comparison: there is nothing to compare and while one should tread cautiously with Intel's slides one should not tread at all with AMD's slides because AMD has a huge legacy of promises not held - how many time did we hear it would catch up in notebook or desktops, in performance or performance/watt. While Intel disappoints from time to time (Pentium 4) AMD disappoints most of the time, its last interesting product was the Opteron. Like most companies without vision it ends up doing stupid mergers instead of concentrating on core business.

    On Intel vs ARM. Silvermont looks promising but Intel needs to accelerate its roadmap. At the end of the year it probably won't compete against a 28nm A15. Qualcomm will not sleep for a year. Also it will have to invest heavily into marketing and OEM incentives if it seriously wants a share of the mobile pile. Will shareholders
  • ET - Tuesday, May 7, 2013 - link

    I'm excited. A 7-8" full Windows tablet with decent performance would be very neat. I'll wait to see what performance this gets in games. I don't need much, just enough to run adventure games and such.
  • R0H1T - Tuesday, May 7, 2013 - link

    Then get ready to shell out upwards of 500$ /:
  • pensive69 - Tuesday, May 7, 2013 - link

    can't stand getting a partially functioning market focused 'hack' on a cellphone.
    if the 22nm drill provides a full computer in a smaller form then factor me in!
    i don't care which firm does it...like those kids in the commercial
    we just want more we want more :).
    love it.
  • Laststop311 - Tuesday, May 7, 2013 - link

    this chip will have to pull off a miracle to drive full windows 8 and the everyday apps people use. Seems like it's going to average maybe slightly over 2x performance. That seems like a lot but when you see how poor current atoms are double that performance still is not enough. Does have potential in android phones/tablets and windows 8 phones/tablets as long as it's windows rt on the tablet. Atom still is not good enough for full windows 8

Log in

Don't have an account? Sign up now