The Sandy Bridge Preview
by Anand Lal Shimpi on August 27, 2010 2:38 PM ESTA New Architecture
This is a first. Usually when we go into these performance previews we’re aware of the architecture we’re reviewing, all we’re missing are the intimate details of how well it performs. This was the case for Conroe, Nehalem and Lynnfield (we sat Westmere out until final hardware was ready). Sandy Bridge, is a different story entirely.
Here’s what we do know.
Sandy Bridge is a 32nm CPU with an on-die GPU. While Clarkdale/Arrandale have a 45nm GPU on package, Sandy Bridge moves the GPU transistors on die. Not only is the GPU on die but it shares the L3 cache of the CPU.
There are two different GPU configurations, referred to internally as 1 core or 2 cores. A single GPU core in this case refers to 6 EUs, Intel’s graphics processor equivalent (NVIDIA would call them CUDA cores). Sandy Bridge will be offered in configurations with 6 or 12 EUs.
While the numbers may not sound like much, the Sandy Bridge GPU is significantly redesigned compared to what’s out currently. Intel already announced a ~2x performance improvement compared to Clarkdale/Arrandale, and I can say that after testing Sandy Bridge Intel has been able to achieve at least that.
Both the CPU and GPU on SB will be able to turbo independently of one another. If you’re playing a game that uses more GPU than CPU, the CPU may run at stock speed (or lower) and the GPU can use the additional thermal headroom to clock up. The same applies in reverse if you’re running something computationally intensive.
On the CPU side little is known about the execution pipeline. Sandy Bridge enables support for AVX instructions, just like Bulldozer. The CPU will also have dedicated hardware video transcoding hardware to fend off advances by GPUs in the transcoding space.
Caches remain mostly unchanged. The L1 cache is still 64KB (32KB instruction + 32KB data) and the L2 is still a low latency 256KB. I measured both as still 4 and 10 cycles respectively. The L3 cache has changed however.
Only the Core i7 2600 has an 8MB L3 cache, the 2400, 2500 and 2600 have a 6MB L3 and the 2100 has a 3MB L3. The L3 size should matter more with Sandy Bridge due to the fact that it’s shared by the GPU in those cases where the integrated graphics is active. I am a bit puzzled why Intel strayed from the steadfast 2MB L3 per core Nehalem’s lead architect wanted to commit to. I guess I’ll find out more from him at IDF :)
The other change appears to either be L3 cache latency or prefetcher aggressiveness, or both. Although most third party tools don’t accurately measure L3 latency they can usually give you a rough idea of latency changes between similar architectures. In this case I turned to cachemem which reported Sandy Bridge’s L3 latency as 26 cycles, down from ~35 in Lynnfield (Lynnfield’s actual L3 latency is 42 clocks).
As I mentioned before, I’m not sure whether this is the result of a lower latency L3 cache or more aggressive prefetchers, or both. I had limited time with the system and was unfortunately unable to do much more.
And that’s about it. I can fit everything I know about Sandy Bridge onto a single page and even then it’s not telling us much. We’ll certainly find out more at IDF next month. What I will say is this: Sandy Bridge is not a minor update. As you’ll soon see, the performance improvements the CPU will offer across the board will make most anyone want to upgrade.
200 Comments
View All Comments
Anand Lal Shimpi - Friday, August 27, 2010 - link
We'll have to wait a little bit to find out... :)hnzw rui - Friday, August 27, 2010 - link
If it doubles Clarkdale's GPU performance, then it probably will (at least on lower resolutions). I'm getting pretty decent framerates from Clarkdale on 1360x768 Low and I've been able to play on 1360x768 Medium with a Radeon HD 4550. I think Sandy Bridge is probably closer to the latter than the former in performance.SteelCity1981 - Friday, August 27, 2010 - link
Any word when Intel will launch a mobile version of this new platform?Anand Lal Shimpi - Friday, August 27, 2010 - link
Q1 2011 :)BSMonitor - Friday, August 27, 2010 - link
So glad I waited on a 13" macbook pro! Sandy Bridge will probably be the next revision for Macbook's ehh?cheinonen - Friday, August 27, 2010 - link
Exactly my thoughts, that the GPU performance looks to be good enough that Apple could use it for the 13" MBP refresh next year. I'll be glad that I decided to wait, that's fur sure.synergist - Friday, August 27, 2010 - link
I was doing some research and if they would have to use the full integrated graphics core, with the 12 cores, to top the performance of the 320M in the current macbook pro 13 I doubt apple would take a step backwards in gfx performance, and use the 6 core integrated gfx.and the performance would still be pretty close, that the 320M would lose to the inter grated gfx (12 cores) by about 10-13%
and llano is still an option, but I have a feeling it will be a dead heat with this.
starfalcon - Friday, August 27, 2010 - link
Apple has to know a lot of the people buying the laptops are far from high end gamers.The amount of people with 320Ms who don't need them is probably a lot.
We'll see how all the different parts of Sandy Bridge work out.
Don't the Core iX processors not work with Nvidia Integrated graphics at all?
JarredWalton - Friday, August 27, 2010 - link
Correct on NVIDIA IGPs not working with Core 2010 (and presumably beyond). The need the QPI interface and Intel isn't licensing that to them.As for Apple, one thing to note is that they've started shipping all laptops with GPUs that support OpenCL it seems, so if Sandy Bridge doesn't have that they may do the same Optimus-style setup as current MBP. Not sure what they'd do with the base MacBook in that case, but Apple seems like they're gearing up to start leveraging OpenCL at some point in the near future. Pure speculation, though, so maybe SB's IGP will be enough, even if it's a step down from G320M.
DanNeely - Friday, August 27, 2010 - link
Aside from on the high end (LGA 1366/2011) the bus nVidia needs is DMI, not QPI. If I was nVidia I'd insist on getting rights to both because QPI is more futureproof. Specifically having more than a few high speed SATA6GB/USB3/etc devices will be able to saturate DMI since it's only the equivalent of a PCIe 4x slot (1.0 speed for 1156, 2.0 for 1155/2011) while QPI is a much higher capacity bus similar to AMD's HyperTransport.While intel seems determined to milk as much out of the (presumably) cheaper to implement DMI bus as it can; sooner or later they're going to either need to mainstream QPI or have the CPU die eat the SATA/USB3 controllers. I find the latter unlikely because it would require cramming even more data lines into the already overcrowded CPU socket area.