The Era of Tera: Intel Reveals more about 80-core CPU
by Anand Lal Shimpi on February 11, 2007 5:44 PM EST- Posted in
- CPUs
Clocks and Power Management
In a modern day microprocessor, making sure the clock signal arrives at the same time across all parts of the chip can be a difficult task for a designer, especially as CPU frequencies and chip area both increase. However it's a necessary part of chip design as the clock needs to arrive at all parts of the chip within tight parameters in order for the CPU to behave normally. Intel tells us that in modern day microprocessors, clock distribution is responsible for approximately 30% of a chip's total power consumption and thus any power savings you can make here will be significant.
The teraflop chip however isn't a conventional chip; as each tile is independent, the clock only needs to arrive to all parts of a 3mm^2 tile at the same time, not to the entire 275mm^2 chip. With this in mind, Intel designed the teraflop chip to allow for the clock to arrive at individual tiles out of phase. This approach means that tile-to-tile communication may end up being a bit slower than it could, but the power savings are tremendous. Intel estimates at the power required to distribute the clock to all of the tiles on the teraflop chip at 4GHz is 2.2W, or 1.2% of total power consumption under load.
Obviously if we had a network of more complicated cores, distributing the clock within a larger more complex tile would require more power than this, but the take away point is that in a network like this you can simplify overall chip clock distribution by only worrying about the clock within a tile.
Clock management isn't the only area where Intel looked to save power, as the teraflop chip architecture is very power conscious in its design. Each tile is divided up into 21 individual sleep regions that can be powered down independently depending on the type of instruction being executed, not to mention that individual tiles can be powered down independently of one another. And as we mentioned before, the PE and router on each tile can be powered down independently.
Within the router itself, each one of the five ports can be powered down independently as well. With 80 cores, the teraflop chip can also redistribute load according to thermal needs. If a handful of tiles are getting too hot, the chip can dynamically wake up a different set of tiles to begin working in order to avoid creating hotspots.
The FPMACs remain in sleep mode until they are needed, so there's an additional latency penalty when waking them up but it prevents power consumption from spiking as soon as there's load which can help simplify power delivery and other elements of the chip as well. Approximately 90% of the FPMAC logic and a total of 74% of each PE uses sleep transistors to help reduce power consumption as we described above. Intel states that sleep transistors take up, on average, 5.4% more die area than regular transistors and come with a 4% frequency penalty, but the power savings are worth it. Sleep transistors are used in other Intel processors, including the Core 2 family.
25 Comments
View All Comments
creathir - Monday, February 12, 2007 - link
With all of this wonderful multi-core bliss, is the software side of things. Multicore means the software needs to be written asymentrically. This will be an incredibly hard challenge, especially on real time applications such as video games. The concept is fantastic, but the proof is in the pudding as they say. I do find Intel's routing technology to be quite interesting, especially the idea of stacking the L1/L2 memory on top (or below rather;)) the cores. The interconnect on them, how would this work exactly I wonder? Should be interesting to see what all 3 of these companies come out with in the coming years. I suppose the nay sayers of Moore's law will be once again disproven...- Creathir
Goty - Sunday, February 11, 2007 - link
So basically it's a Cell processor on steroids. Other than the chip stacking, what's so new about it? People have been talking about 3D packaging for a year or two now, and not just Intel.SocrPlyr - Monday, February 12, 2007 - link
In a way, yes. And in a lot of ways, no. Yes the individual tiles are floating point units, but this chip is not meant to be a functional replacement for anything like the cell is trying to be. You really cannot compare this chip to anything available on the market. It is only a proof of concept. The choice of tiles that are floating point units was probably due to the fact that ultra high performance needs generally are nearly completely FP dependent. When testing and playing with this thing those types of applications are easy to come by. To be honest this chip seems a lot like a DSP chip, and mentioning those you will realize that the Cell processor is anything but an altered one of those. Really on all sides there has been little technology that is completely new, just better implementations.oldhoss - Sunday, February 11, 2007 - link
I'll bet that SOB would give two 8800GTX's a run for their money....CPU-limited be damned! ;-Dmino - Sunday, February 11, 2007 - link
"Since the per-die area doesn't increase, the number of defects don't go up per die."Any sensible person knows that defect-rate is(mostly) dependent on the number of functional units(i.e. transistors), provided the defect-rate off a single unit is set.
The fact that it is NOW mostly tied to die-area is caused exactly by the fact we do NOT use stacked-die aproach yet.
Otherwise a nice news piece. Thanks AT.
mino - Sunday, February 11, 2007 - link
sorry for typpos...notposting - Sunday, February 11, 2007 - link
The picture above this shows the Terascale slide:
http://images.anandtech.com/reviews/cpu/intel/tera...">http://images.anandtech.com/reviews/cpu/intel/tera...
sprockkets - Sunday, February 11, 2007 - link
We have a solution to the problem of ever increasing CPU speed. My question is, who here needs it?For those who need to open 80 Firefox tabs, video encoding, virus scanning and watching a HD movie, at the same time?
Data sets did need to get bigger, but check this out: Music files started out at small sampling rates till about Win98 they got to the cd standard. It stopped there since no one needs it bigger than that, that is, 44.1khz and 16 bit resolution. If you can hear 96/192khz 24bit music better, fine, but we have others saying that 128kbps mp3 was cd quality.
Video resolutions made their way from 640x480 to now around 1600x1200, and widescreen varients of that. Color depth sits at around 32bit. Can you see it improving beyond that?
OK, so we can what, go 3D now, holographic?
Sorry to you Intel and AMD, but the vast majority of the people you sell your technology to can live off a $30 processor and $50 of RAM, the smallest HDD, and a $30 optical drive which does everything.
Would be cool to see a motherboard with built in DDR3 or 4 memory for the cpu/gpu AMD Fusion core, and have 2GB of it, with 32GB of flash built on as well. Let's go for silent computing, you know, back in the day when all processors only had tiny heatsinks on them!!!
joex444 - Monday, February 12, 2007 - link
What part of the article was confusing to you?NOT FOR RETAIL SALE, COMMERCIAL USES ONLY.
I got the idea, guess you didn't. PWNT!
Larso - Monday, February 12, 2007 - link
So, why did we ever bother invent plastic materials? Or why invent the laser? The laser is a good example of an invention that was expected to be a useless curiosity but turned out to be hugely usefull.But this case isn't even comparable to that. There are indeed problems waiting to be solved with this solution. All servers with more than a handfull of CPU's could be cut down in size and power usage tremendously, and what about supercomputers? They are going to be extra extremely powerfull when they change to this kind of cpu's.
And by the way, you have to be quite narrowminded to not see the (sales) potential of supercomputing at home. Lets have computer games with scary intelligent AI's :)