IDF has started and the first benchmarks of Nehalem are going to start popping up. It is without a doubt an impressive architecture with a much better platform to run on, but this CPU is not about giving you better frames per second in your favorite game than the Penryn family. Let me make that more clear: even when the GPU is not the bottleneck, it is likely that most games will not be significantly faster than on Penryn. We, the people behind will probably have the most fun with it, more than your favorite review crew at :-). And no, I have not seen any tests before I type this. Nehalem is about improving HPC, Database, and virtualization performance, and much less about gaming performance. Maybe this will change once games get some heavy physics threads, but not right away.

Why? Most Games are about fast caches and super integer performance. After all, most of the Floating point action is already happening on the GPU. The Core 2 CPUs were a huge step forward in integer performance (not the least because of memory disambiguation) compared to the CPUs of that time (P4 and K8). Nehalem is only a small step forward in integer performance, and the gains due to slightly increased integer performance are mostly negated by the new cache system. In a previous post I told you that most games really like the huge L2 of the Core family. With Nehalem they are getting a 32KB L1 with a 4 cycle latency, next a very small (compared to the older Intel CPUs) 256KB L2 cache with 12 cycle latency, and after that a pretty slow 40 cycle 8MB L3. When running on Penryn, they used to get a 3 cycle L1 and a 14 cycle 6144KB L2. The Penryn L2 is 24 times larger than on Nehalem!

The percentage of L2 caches misses for most games running on a Penryn CPU is extremely low. Now that is going to change. The integrated memory controller of Nehalem will help some, but the fact remains that the L3 is slow and the L2 is small. However, that doesn't mean Intel made a bad choice. Intel made a superbly good choice by improving the performance where Core (Merom/Penryn) was mediocre to good. Penryn was already a magnificent gaming CPU, but it could not beat the AMD competition in HPC benchmarks, and AMD put up a good fight in database performance benchmarks. Now Intel is ready to fix these shortcomings.

Most Database code cannot use the wide architecture of Penryn very well. The number of instructions per cycle can be lower than 0.5 and waiting for the memory is the most probable cause. SMT or Hyper-Threading can do wonders here: while one thread waits for a memory stall, the other thread continues working and vice versa.

Secondly, quad (and eight) socket performance is going to improve a lot as four Nehalems only have to keep four L3 caches in sync, while a similar Tigerton system has to keep eight L2 caches in sync. That is why the cache system is perfect for server performance, but a little less interesting for gaming performance.

The massive bandwidth that the integrated tri-channel memory controller delivers should also do wonders for HPC code, and the new TLB architecture with EPT will make Nehalem shine compared to its older Core brothers.

No, Nehalem wasn't made for the gaming enthusiasts. Rather, it was made to please the IT and HPC people. So we say bring it to; it's just not that interesting for you gamers! ;-)

Comments Locked


View All Comments

  • Saen11 - Tuesday, August 19, 2008 - link

    If Core i7 does not improve gaming performance, and AMD's new 45nm chips do, could this then give AMD a chance to steal the gaming performance crown?
  • DXRick - Tuesday, August 19, 2008 - link

    With Intel fantasizing about doing away with graphics cards and making ray tracing a reality for games, and Nehalem offering a 30% increase in CPU power, I would guess it is a stepping stone to Larrabee for gamers then. I can't see current game developers doing anything differently yet, as they will continue to use shaders (DirectX or OpenGL).

    The one exception is Microsoft's FlightSimX, which benefits more from CPU power than GPU.

    How will Nehalem impact developers looking at CUDA and PhysX?

  • rgallant - Tuesday, August 19, 2008 - link

    Look's like my E8600 @ 4.2 is good for a while.
  • ICE1966 - Tuesday, August 26, 2008 - link

    What in the hell do you need a 4.2ghz cpu for? Oh, wait, its just bragging rights, LOL
  • Genx87 - Tuesday, August 19, 2008 - link

    I didnt know it had such a small L2 cache with a big L3. I will guess this is a process issue? With subsequent process shrinks we will start to see larger L2 per core?

    I would agree that small of a cache will cripple it in gaming. Possibly making any bebefits minimal over Core 2 duo.

    Should be interesting to see how it works out. Thought we had some benchmarks from last Spring that showed it 20-30% faster per clock in gaming? Either way I am on an E8400 and happy.
  • JohanAnandtech - Tuesday, August 19, 2008 - link

    I wouldn't call it a process issue. 4 x 256 KB L2 + 8 MB L3 is a lot of cache. :-)

    It is a trade-off as always. A small L2 cache for every core avoids that two cores at full throttle cause extra latencies and get in the way of each other. The L3 makes sure only one cache has to be kept coherent between the different socket.

    It is completely inclusive L3, so it has to be a lot bigger than the L2, otherwise it is not effective. And it easier to keep the power cost down if you have a large L3 that is clocked lower and at a different power plane.
  • chizow - Tuesday, August 19, 2008 - link

    Great points about the smaller L2 and slower L3, I'll have to keep an eye on that when looking at any i7 gaming performance benchies. My main buying point will be overclockability however. If Nehalem allows for higher clockspeeds, that should provide the benefit in gaming with all else counter-balancing against Penryn. If it clocks about the same as Penryn I'll just wait for the 32nm refresh before upgrading from my P45/Q6600 and let DDR3 prices drop a bit more.
  • JohanAnandtech - Wednesday, August 20, 2008 - link

    I have my doubts about higher overclockability, especially if you would compare with one of the youngest steppings of Penryn. After all, a tri channel memory channel and 2 QPI links do not come for free. Nehalem is a bit more power hungry running at full speed (but it can also shutdown it's cores so it will consume less running light tasks). I would expect Nehalem to stay a bit behind Penryn in clock speeds unless Intel artificially keeps Penryn low with newer steppings.
  • IceBreakerG - Tuesday, August 19, 2008 - link

    This is interesting information. Right now I'm trying to decide if I should just go ahead and build a Q9550 system or try to wait for Nehalem. My current system, Athlon 64 3800+ single core, has been showing it's age for quite a while now, and I need an upgrade. Since gaming performance isn't expected to go up much, but other areas are, I'm still on the fence.

    I have my Xbox 360 for games, so gaing isn't too important (as long as I can play Roller Coaster Tycoon 3 lol). However, I do a lot of video encoding and music production. Too many decisions. I think either way, whatever I decide, the new system will be significantly faster than what I have now so either will be a nice upgrade for me.
  • gochichi - Thursday, August 21, 2008 - link

    I think waiting is silly, particularly as DDR2 memory is so cheap and so good.

    Get a Quad-core, get 8GB of RAM, and never look back... don't wait another minute.

Log in

Don't have an account? Sign up now