Fall IDF 2005 - Day 1: Intel's New Architecture Details Revealed
by Anand Lal Shimpi on August 23, 2005 1:17 PM EST- Posted in
- Trade Shows
Immediately following the slightly disappointing keynote, Intel revealed a few more details about their next-generation microprocessor architecture. As of now, the new architecture doesn't have a name, but we've got some of its features now.
Intel has come out and said that the next-gen microarchitecture will be a unified architecture, combining the lessons learned from the Pentium 4's NetBurst and Pentium M's Banias architectures. To put it bluntly, the next-generation microprocessor architecture borrows the FSB and 64-bit capabilities of NetBurst and combines it with the power saving features of the Pentium M platform. Features like virtualization and security will also be a part of the new architecture.
Contrary to wild speculation, Intel's new architecture will continue to feature an Out of Order execution core; a direct descendant of the Pentium M and Pentium 4 predecessors. The core will be a wider 4-issue core (4-issue decode, execute and retire) with deeper buffers, presumably with more instructions in flight than the Pentium 4 courtesy of the 4-issue core.
The basic integer pipeline appears to be 14 stages long, making it a significant decrease from the 31+ stage pipeline in Prescott and a slight increase from the 12 stage pipeline in the Athlon 64. Intel's move to a much shorter pipeline will definitely decrease power consumption (as well as clock speed), but hopefully improve performance considerably.
Note that with a 4-issue core, the new processors will actually have a higher degree of ILP than AMD's Athlon 64, and with a slightly deeper pipeline the CPU should be able to reach higher clock speeds than what AMD has been able to achieve. We'd expect that at 65nm these new cores could run as high as 3GHz in clock speed, but definitely not at the 4GHz+ levels that we currently have with the Pentium 4.
Given the significant reduction in pipeline stages, Intel's claims of a 5x improvement in performance per watt over the Pentium 4 architecture seems very realistic.
The new architecture will feature a shared L2 cache between the cores, much like what we've seen from Yonah already. Intel also said that there would be a higher "relative" increase in L2 cache bandwidth. The new processors will also apparently feature a direct L1-to-L1 cache transfer system in order to improve the currently very poor cache-to-cache transfer performance of Intel's dual core processors.
There are also a number of new prefetching algorithms, allowing data to be prefetched from L1 to L1 (one core to another), L1 to L2, etc... Intel is also introducing speculative data loads with the new architecture, loads to be executed ahead of stores if a dependency is predicted to not exist between the two. We are waiting for more details on the feature to be exact about its functionality.
Both Conroe and Merom (desktop and mobile) will feature 2 cores. Intel says that Conroe will be available in multiple L2 cache sizes, while Merom will not. We'd assume that the multiple L2 cache sizes would be to accomodate and differentiate products like the Extreme Edition.
On the server side, the first next-gen architecture CPU will be the dual core Woodcrest, followed by the quad-core Whitefield processor.
More info as we get it.
26 Comments
View All Comments
BitByBit - Tuesday, August 23, 2005 - link
So, no mention of Hyperthreading so far.It seems reasonable to assume that Intel has either found a way to temporarily deactivate idle execution units, or has implemented a greatly improved Out-of-order engine able to keep all four (integer units) busy. Considering the increase in power draw additional execution units cause, it seems likely that with Conroe's emphasis on Performance per Watt, Intel felt they could widen the core without sacrificing efficiency.
So, how have they managed this?
JarredWalton - Tuesday, August 23, 2005 - link
In the past, I've read that there are two main reasons for doing SMT. One is that you have a deep pipeline and want to keep the penalty of mispredicted branches and stalls to a minimum. The other is if you have a really wide core and lots of execution units, and you want to keep them filled. I'd say there's still a reasonable chance that we'll see some form of SMT in Conroe, though it could be like the original HTT where it sits inactive initially and Intel only turns it on after further testing. (/speculation)IntelUser2000 - Wednesday, August 24, 2005 - link
Pentium 4 doesn't have 4-issue wide core. Why did Anand assume that?
or perhaps in means more instructions in flight than Pentium 4 because Conroe has 4-issue core?
BitByBit - Wednesday, August 24, 2005 - link
The number of instructions active in the pipeline at any one time is determined by the issue rate and the pipeline length.Although the P4 had three integer units, its decode rate was actually quite low, as a result of its single decoder (and Trace Cache).
Now that we know Conroe is a four-issue design, it could well have more instructions in flight that the P4, with deeper re-order buffers in order to extract a sufficient amount of ILP from the instruction stream to keep its execution units busy.
UNCjigga - Tuesday, August 23, 2005 - link
I'm guessing we'll have ~2.0 to ~2.6GHz at launch, maybe ~1.6 to ~2GHz for the mobile parts?coldpower27 - Tuesday, August 23, 2005 - link
Nah, this is a 65nm process, were looking at low to mid 2GHZ for the mobile chips with mid to high 2 GHZ for the desktop revisions, also as time goes on they should breach 3 GHZ on this processor for desktop.Leper Messiah - Tuesday, August 23, 2005 - link
something like that. Maybe not even that high, with a slightly longer pipeline starting maybe 2.4GHz max.Wee for integer performance. How 'bout some concrete benchies if they actually have some out.
Hacp - Tuesday, August 23, 2005 - link
Why only 2.4 GHZ? I think Intel could start at 2.8, go up to 3.2-3.4 and work their way down to 2.4...Doormat - Tuesday, August 23, 2005 - link
I really thought Intel would have put that in this generation. I think the two cores will be beating each other up for the memory controller in peak usage scenarios. Even with a 1066Mhz FSB and dual channel ddr 667.UNCjigga - Tuesday, August 23, 2005 - link
Where does it say they lack an on-die memory controller? Methinks direct L1-to-L1 transfer and unified L2 cache means something 'intelligent' sits between the cores?