The Sandy Bridge Preview
by Anand Lal Shimpi on August 27, 2010 2:38 PM ESTA New Architecture
This is a first. Usually when we go into these performance previews we’re aware of the architecture we’re reviewing, all we’re missing are the intimate details of how well it performs. This was the case for Conroe, Nehalem and Lynnfield (we sat Westmere out until final hardware was ready). Sandy Bridge, is a different story entirely.
Here’s what we do know.
Sandy Bridge is a 32nm CPU with an on-die GPU. While Clarkdale/Arrandale have a 45nm GPU on package, Sandy Bridge moves the GPU transistors on die. Not only is the GPU on die but it shares the L3 cache of the CPU.
There are two different GPU configurations, referred to internally as 1 core or 2 cores. A single GPU core in this case refers to 6 EUs, Intel’s graphics processor equivalent (NVIDIA would call them CUDA cores). Sandy Bridge will be offered in configurations with 6 or 12 EUs.
While the numbers may not sound like much, the Sandy Bridge GPU is significantly redesigned compared to what’s out currently. Intel already announced a ~2x performance improvement compared to Clarkdale/Arrandale, and I can say that after testing Sandy Bridge Intel has been able to achieve at least that.
Both the CPU and GPU on SB will be able to turbo independently of one another. If you’re playing a game that uses more GPU than CPU, the CPU may run at stock speed (or lower) and the GPU can use the additional thermal headroom to clock up. The same applies in reverse if you’re running something computationally intensive.
On the CPU side little is known about the execution pipeline. Sandy Bridge enables support for AVX instructions, just like Bulldozer. The CPU will also have dedicated hardware video transcoding hardware to fend off advances by GPUs in the transcoding space.
Caches remain mostly unchanged. The L1 cache is still 64KB (32KB instruction + 32KB data) and the L2 is still a low latency 256KB. I measured both as still 4 and 10 cycles respectively. The L3 cache has changed however.
Only the Core i7 2600 has an 8MB L3 cache, the 2400, 2500 and 2600 have a 6MB L3 and the 2100 has a 3MB L3. The L3 size should matter more with Sandy Bridge due to the fact that it’s shared by the GPU in those cases where the integrated graphics is active. I am a bit puzzled why Intel strayed from the steadfast 2MB L3 per core Nehalem’s lead architect wanted to commit to. I guess I’ll find out more from him at IDF :)
The other change appears to either be L3 cache latency or prefetcher aggressiveness, or both. Although most third party tools don’t accurately measure L3 latency they can usually give you a rough idea of latency changes between similar architectures. In this case I turned to cachemem which reported Sandy Bridge’s L3 latency as 26 cycles, down from ~35 in Lynnfield (Lynnfield’s actual L3 latency is 42 clocks).
As I mentioned before, I’m not sure whether this is the result of a lower latency L3 cache or more aggressive prefetchers, or both. I had limited time with the system and was unfortunately unable to do much more.
And that’s about it. I can fit everything I know about Sandy Bridge onto a single page and even then it’s not telling us much. We’ll certainly find out more at IDF next month. What I will say is this: Sandy Bridge is not a minor update. As you’ll soon see, the performance improvements the CPU will offer across the board will make most anyone want to upgrade.
200 Comments
View All Comments
DanNeely - Friday, August 27, 2010 - link
Maybe, but IIRC Apple's biggest issue with the Clarkdale platform on smaller laptops was wanting to maintain CUDA support across their entire platform without adding a 3rd chip to the board, not general GPU performance. Unless the Intel/nVidia lawsuit concludes with nVidia getting a DMI license or Intel getting a CUDA license this isn't going to change.Pinski - Saturday, August 28, 2010 - link
I don't think it has anything to do with CUDA. I mean, they sell Mac Pros with AMD/ATI Cards in them, and they don't support CUDA. It's more of OpenCL and high enough performance. However, just looking at these new performance, I'm willing to say that it'll be the next chip for the MBP 13" easily.Pinski - Saturday, August 28, 2010 - link
Well, wait never mind. Apparently it doesn't support OpenCL, which basically puts it out of the picture for Apple to use.starfalcon - Saturday, August 28, 2010 - link
Hmm, they really want all of the systems to have OpenCL?I don't have OpenCL and I don't care at all and I have CUDA but have only used it once.
320M doesn't even have OpenCl does it?
Seems like it would be ok for the less expensive ones to have Intel graphics and the higher end ones to have CUDA, OpenCL, and better gaming performance if someone cares about those.
They'll keep on upgrading the performance and features of Intel graphics though, who knows.
Veerappan - Thursday, September 2, 2010 - link
No, just ... no.Nvidia implements an OpenCL run-time by translating OpenCL API calls to CUDA calls. If your card supports CUDA, it supports OpenCL.
The 320M supports OpenCL, and every Apple laptop/desktop that has shipped in the last few years has as well.
A large portion of the motivation for OS X 10.6 (Snow Leopard) was introducing OpenCL support.. along with increasing general performance.
There is a large amount of speculation that OS X 10.7 will take advantage of the OpenCL groundwork that OS X 10.6 has put in place.
Also, in the case that you have a GPU that doesn't support OpenCL (older Intel Macs with Intel IGP graphics), Apple has written a CPU-based OpenCL run-time. It'll be slower than GPU, but the programs will still run. That being said, I highly doubt that Apple will be willing to accept such a performance deficit existing in a brand new machine compared to prior hardware.
Penti - Saturday, August 28, 2010 - link
It has more to do with nVidia's VP3 PureVideo engine which they rely on for video acceleration. It's as simple as that.Which is why they only find their place in the notebooks. It's also a low-end gpu with enough performance to say run a source game at low res. And they have more complete drivers for OS X.
CUDA is a third party add on. OpenCL isn't.
burek - Friday, August 27, 2010 - link
Will there be a "cheap"(~$300) 6-core LGA-2011 replacement for i7 920/930 or will Intel limit the 6/8 cores to the high-end/extreme price segment ($500+)?DJMiggy - Friday, August 27, 2010 - link
yea I doubt that will happen. It would be like trying to SLI/crossfire an nvidia to an ati discrete. You would need a special chip like the hyrda one.DJMiggy - Friday, August 27, 2010 - link
Hydra even. Hydra Lucid chip.Touche - Friday, August 27, 2010 - link
Questionable overclocking is bad enough, but together with..."There’s no nice way to put this: Sandy Bridge marks the third new socket Intel will have introduced since 2008."
"The CPU and socket are not compatible with existing motherboards or CPUs. That’s right, if you want to buy Sandy Bridge you’ll need a new motherboard."
"In the second half of 2011 Intel will replace LGA-1366 with LGA-2011."
...it is just terrible!
I'll definitely buy AMD Bulldozer, even if it ends up a bit slower. At least they have some respect for their customers and an ability of forward thinking when designing sockets (actually, Intel probably has it too, but just likes to milk us on chipset purchases also). And I am no fanboy, 4 of my 7 PC's are Intel based (two of those 4 were my latest computer purchases).