It has taken about 2 years longer than we’d normally see, but the next full generation of GPUs are finally upon us. Powered by FinFET based nodes at TSMC and GlobalFoundries, both NVIDIA and AMD have released new GPUs with new architectures built on new manufacturing nodes. AMD and NVIDIA did an amazing job making the best of 28nm over the 4 year stretch, but now at long last true renewal is at hand for the discrete GPU market.

Back in May we took a first look at the first of these cards, NVIDIA’s GeForce GTX 1080 Founders Edition. Launched at $700, it was immediately the flagship for the FinFET generation. Now today, at long (long) last, we will be taking a complete, in-depth look at the GTX 1080 Founders Edition and its sibling the GTX 1070 Founders Edition. Architecture, overclocking, more architecture, new memory technologies, new features, and of course copious benchmarks. So let’s get started on this belated look at the latest generation of GPUs and video cards from NVIDIA.

NVIDIA GPU Specification Comparison
  GTX 1080 GTX 1070 GTX 980 GTX 970
CUDA Cores 2560 1920 2048 1664
Texture Units 160 120 128 104
ROPs 64 64 64 56
Core Clock 1607MHz 1506MHz 1126MHz 1050MHz
Boost Clock 1733MHz 1683MHz 1216MHz 1178MHz
Memory Clock 10Gbps GDDR5X 8Gbps GDDR5 7Gbps GDDR5 7Gbps GDDR5
Memory Bus Width 256-bit 256-bit 256-bit 256-bit
VRAM 8GB 8GB 4GB 4GB
FP64 1/32 1/32 1/32 1/32
TDP 180W 150W 165W 145W
GPU GP104 GP104 GM204 GM204
Transistor Count 7.2B 7.2B 5.2B 5.2B
Manufacturing Process TSMC 16nm TSMC 16nm TSMC 28nm TSMC 28nm
Launch Date 05/27/2016 06/10/2016 09/18/14 09/18/14
Launch Price MSRP: $599
Founders $699
MSRP: $379
Founders $449
$549 $329

As a quick refresher, here are the specifications for the new cards. At a high level the Pascal architecture (as implemented in GP104) is a mix of old and new; it’s not a revolution, but it’s an important refinement. Maxwell as an architecture was very successful for NVIDIA both at the consumer level and the professional level, and for the consumer iterations of Pascal, NVIDIA has not made any radical changes. The basic throughput of the architecture has not changed – the ALUs, texture units, ROPs, and caches all perform similar to how they did in GM2xx.

Consequently the performance aspects of consumer Pascal – we’ll ignore GP100 for the moment – are pretty easy to understand. NVIDIA’s focus on this generation has been on pouring on the clockspeed to push total compute throughput to 8.9 TFLOPs, and updating their memory subsystem to feed the beast that is GP104.

GeForce GTX 1080

The GeForce GTX 1080 is a fully enabled implementation of GP104. This means 2560 CUDA cores split up over 20 SMs operating at a blistering boost clock of 1733MHz. NVIDIA is positioning GTX 1080 as a full generational update over GTX 980, and thanks to a combination of a slightly wider GPU and a much faster clockspeed, they can generally deliver on this. By the numbers, GTX 1080 offers 78% more raw compute, texturing, and geometry performance, and 43% more ROP throughput. Of course the latter is as much a product of memory bandwidth as it is the ROPs themselves, and for that NVIDIA has some new memory technologies.

Feeding the beast that is GTX 1080 is 8GB of GDDR5X. A new memory standard that extends the effective memory bandwidth of GDDR5, GTX 1080’s GDDR5X runs at 10Gbps, and is attached to a 256-bit memory bus. This gives GTX 1080 a full 320GB/sec of memory bandwidth to play with, 43% more than GTX 980. And as we’ll see in the coming architectural pages, these raw numbers don’t factor in the architectural improvements that allow the Pascal GPUs to stretch their memory bandwidth even further.

Finally, GTX 1080’s TDP is rated at 180W. This is a slight increase from the past generation, where GTX 980 required 165W. Video card specifications are of course a sliding scale – balancing desired performance with cooling capabilities and power consumption – and ultimately NVIDIA has opted to eat a slight increase in power consumption to allow GTX 1080 to deliver more performance than it otherwise would.

GeForce GTX 1070

Meanwhile below the GTX 1080 we have its lower price and lower performance sibling, the GTX 1070. The standard high-end salvage part, GTX 1070 trades off fewer functional blocks and the lower resulting performance in exchange for a significantly lower price than the GTX 1080. From a hardware perspective, the GTX 1070 utilizes GP104 with 1 of the 4 Graphics Processing Clusters (GPCs) disabled. Relative to GTX 1080, this knocks off around 25% of the shading/texturing/compute performance. However the memory controllers and ROP partitions remain untouched. With this configuration NVIDIA is pitching the GTX 1070 as a full generational update to the GTX 970, and with any luck, the GTX 1070 will be as well accepted as its extremely successful predecessor.

All told then, GTX 1070 provides 1920 CUDA cores split up over 15 SMMs. Those 15 SMMs are in turn running at a base clockspeed of 1506MHz and a boost clock of 1683MHz. This is slightly lower than GTX 1080, but as we’ll see in our full benchmark section, the official clockspeeds have a very little impact; it’s the disabled GPC that really makes the difference. By the numbers, relative to the GTX 970 the GTX 1070 offers 65% more shading, texturing, and geometry throughput, and 63% more ROP throughput. The latter coming as a courtesy of both the higher clockspeeds and the fact that GTX 1070 ships with all 64 ROPs enabled, versus 56 of 64 on GTX 970.

As for memory, GTX 1070 doesn’t get GDDR5X. Instead the card gets 8GB of GDDR5 running at 8Gbps. This delivers a total memory bandwidth of 256GB/sec, and again unlike GTX 970, there is nothing going on with partitions here, so all of that memory and all of that bandwidth is operating in one contiguous partition, giving the GTX 1070 an effective memory bandwidth increase of 31%. GTX 1070 is the first NVIDIA card to ship with 8Gbps GDDR5, a memory speed I once didn’t think possible. NVIDIA and the memory partners are pushing GDDR5 to the limit by doing this, but at this point in time this is the most economical way to boost memory bandwidth without resorting to more exotic and expensive solutions like GDDR5X.

GTX 1070 is rated for a 150W TDP; this is a smaller, 5W increase over its predecessor. Despite the official TDP, it should be noted that NVIDIA is not pitching this card as their 150W champion for systems with a single 6-pin PCIe power cable, and it will require a more powerful 8-pin cable. For systems that need a true sub-150W card, this is where the GTX 1060 will step in. Otherwise NVIDIA is making a very interesting power play here what is now the second most powerful video card on the market does so on just 150W.

Cards, Pricing, & Availability

For the GTX 1000 series, NVIDIA has undertaken a significant change in how they handle reference boards and how those boards are priced. What were once reference boards are now being released as the Founders Edition boards. These boards are largely similar to NVIDIA’s last-generation reference boards, built using a standard PCB and NVIDA’s high-end blower cooler, along with some additional cooling upgrades. The Founders Edition cards will, in turn, not be sold at NVIDIA’s general MSRP for each family, but rather they will be sold as premium cards for around $80-$100 more.

As a result we have two prices to talk about. For the GTX 1080, the family MSRP is $599. At the base level this is a slight price increase over the GTX 980, which launched at $549. As the Founders Edition cards are not being sold at this price, it is instead being filled by semi and fully custom cards from NVIDIA’s partners. These custom cards offer a mix of designs, but at the cheapest level (those cards closest to the MSRP) we’re predominantly looking at dual fan open air cooled cards. The rest of the lineup is filled by more advanced cards (including some closed loop liquid coolers) with factory overclocks and other features that are sold at a premium price. The GTX 1080 Founders Edition card, for its part, fits in to this picture at $699, a $100 premium.

GeForce GTX 1080 Configurations
  Base Founders Edition
Core Clock 1607MHz 1607MHz
Boost Clock 1733MHz 1733MHz
Memory Clock 10Gbps GDDR5X 10Gbps GDDR5X
Cooler Manufacturer Custom
(Typical: 2 or 3 Fan Open Air)
NVIDIA Reference
(Blower w/Vapor Chamber)
Price Starting at $599 $699

The story then is much the same for the GTX 1070. Its family MSRP is $379, which its Founders Edition counterpart is being sold for $449. At $379 for the family MSRP, this is a $40 price increase over the GTX 970, and I am curious over the long run whether this will significantly impact sales. One of the factors that made GTX 970 such a well-received card was its price, and this takes away from that by a bit. Otherwise, as with the GTX 1080, the partners’ custom cards for the GTX 1070 run the gamut from simple dual fan cards at the cheapest prices, up to premium, factory overclocked cards at the highest prices.

GeForce GTX 1070 Configurations
  Base Founders Edition
Core Clock 1506MHz 1506MHz
Boost Clock 1683MHz 1683MHz
Memory Clock 8Gbps GDDR5 8Gbps GDDR5
Cooler Manufacturer Custom
(Typical: 2 or 3 Fan Open Air)
NVIDIA Reference
(Blower w/Vapor Chamber)
Price Starting at $379 $449

Unfortunately for everyone involved, the plan for pricing and reality haven’t quite agreed with each other. Even now, 2 months after the launch of the GTX 1080, card supplies are slim. There is effectively a shortage of GTX 1080 cards, as while NVIDIA insists they are continuing to ship out a good supply, those cards appear to be getting plucked off of virtual and physical shelves almost as quickly. As of the time this paragraph was written, Newegg only has a single GTX 1080 in stock, a Founders Edition card at $699.

For the last several generations it has been pretty common for the first batch or two of high-end cards to sell out, however to be sold out for 2 months is a lot less common. Other than NVIDIA’s Titan series card, which are a special case due to their prosumer market, I can’t immediately recall the last time an NVIDIA flagship card was this hard to get this late after a launch. For NVIDIA and its partners there are worse problems in the world – it’s better to have too few cards than too many cards that you can’t sell – but it certainly puts a damper on things for both the partners and for customers.

Meanwhile the GTX 1070 situation is noticeably better, though still not great. About half of the models that Newegg carries are in stock at any given time. So potential GTX 1070 owners have more options, though if they’re after a specific card they may find themselves waiting.

But the real problem with this shortage is that it has removed any incentive to keep prices close to NVIDIA’s MSRP. GTX 1070 prices start at $429 instead of $379, while GTX 1080 prices start at $649 (and if you actually want a card in stock, that’ll be $699). These are prices that are closer to last generations GTX 980 Ti/980 prices than they are 980/970, and it means that the actual GTX 1000 series price premium is much higher as it stands, at $100+ compared to the last generation. Given that these cards keep selling out, clearly there are enough buyers willing to pay these prices – it’s the free market in action – but it means NVIDIA’s MSRPs are for the moment an imaginary number. At this point all that we can do is hope that once the shortage breaks, there will be more intensive competition between the partners and retailers, and prices will fall down to MSRP.

As for the larger competitive landscape, as we’re looking at high-end cards at the start of a new generation, there really isn’t any competition to speak of. The GTX 1000 series sets a new bar for performance, and while last generation cards are being priced to clear out inventories, they aren’t performance competitive with the new cards. Meanwhile stalwart competitor AMD has opted to go after the mainstream market first rather than starting at the high-end. This means that the GTX 1080 and GTX 1070 will not have any competition for at least the next few months, leaving NVIDIA solely in the driver’s seat at the high-end, and in sole possession of the GPU performance crown.

Summer 2016 GPU Pricing Comparison
AMD Price NVIDIA
  $699 GeForce GTX 1080
Radeon R9 Fury X
Radeon R9 Nano
$459  
  $439 GeForce GTX 1070
  $419 GeForce GTX 980 Ti
Radeon R9 390X $329  
Radeon R9 390 $249 GeForce GTX 1060
GeForce GTX 970
Radeon RX 480 (8GB) $239  
Pascal’s Architecture: What Follows Maxwell
Comments Locked

200 Comments

View All Comments

  • Ryan Smith - Friday, July 22, 2016 - link

    2) I suspect the v-sync comparison is a 3 deep buffer at a very high framerate.
  • lagittaja - Sunday, July 24, 2016 - link

    1) It is a big part of it. Remember how bad 20nm was?
    The leakage was really high so Nvidia/AMD decided to skip it. FinFET's helped reduce the leakage for the "14/16"nm node.

    That's apples to oranges. CPU's are already 3-4Ghz out of the box.

    RX480 isn't showing it because the 14nm LPP node is a lemon for GPU's.
    You know what's the optimal frequency for Polaris 10? 1Ghz. After that the required voltage shoots up.
    You know, LPP where the LP stands for Low Power. Great for SoC's but GPU's? Not so much.
    "But the SoC's clock higher than 2Ghz blabla". Yeah, well a) that's the CPU and b) it's freaking tiny.

    How are we getting 2Ghz+ frequencies with Pascal which so closely resembles Maxwell?
    Because of the smaller manufacturing node. How's that possible? It's because of FinFET's which reduced the leakage of the 20nm node.
    Why couldn't we have higher clockspeeds without FinFET's at 28nm? Because power.
    28nm GPU's capped around the 1.2-1.4Ghz mark.
    20nm was no go, too high leakage current.
    16nm gives you FinFET's which reduced the leakage current dramatically.
    What does that enable you to do? Increase the clockspeed..
    Here's a good article
    http://www.anandtech.com/show/8223/an-introduction...
  • lagittaja - Sunday, July 24, 2016 - link

    As an addition to the RX 480 / Polaris 10 clockspeed
    GCN2-GCN4 VDD vs Fmax at avg ASIC
    http://i.imgur.com/Hdgkv0F.png
  • timchen - Thursday, July 21, 2016 - link

    Another question is about boost 3.0: given that we see 150-200 Mhz gpu offset very common across boards, wouldn't it be beneficial to undervolt (i.e. disallow the highest voltage bins corresponding to this extra 150-200 Mhz) and offset at the same time to maintain performance at lower power consumption? Why did Nvidia not do this in the first place? (This is coming from reading Tom's saying that 1060 can be a 60w card having 80% of its performance...)
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    NVIDIA, get with the program and support VESA Adaptive-Sync already!!! When your $700 card can't support the VESA standard that's in my monitor, and as a result I have to live with more lag and lower framerate, something is seriously wrong. And why wouldn't you want to make your product more flexible?? I'm looking squarely at you, Tom Petersen. Don't get hung up on your G-sync patent and support VESA!
  • AnnonymousCoward - Thursday, July 21, 2016 - link

    If the stock cards reach the 83C throttle point, I don't see what benefit an OC gives (won't you just reach that sooner?). It seems like raising the TDP or under-voltaging would boost continuous performance. Your thoughts?
  • modeless - Friday, July 22, 2016 - link

    Thanks for the in depth FP16 section! I've been looking forward to the full review. I have to say this is puzzling. Why put it on there at all? Emulation would be faster. But anyway, NVIDIA announced a new Titan X just now! Does this one have FP16 for $1200? Instant buy for me if so.
  • Ryan Smith - Friday, July 22, 2016 - link

    Emulation would be faster, but it would not be the same as running it on a real FP16x2 unit. It's the same purpose as FP64 units: for binary compatibility so that developers can write and debug Tesla applications on their GeForce GPU.
  • hoohoo - Friday, July 22, 2016 - link

    Excellent article, Ryan, thank you!

    Especially the info on preemption and async/scheduling.

    I expected the preemption mght be expensive in some circumstances, but I didn't quite expect it to push the L2 cache though! Still this is a marked improvement for nVidia.
  • hoohoo - Friday, July 22, 2016 - link

    It seems like the preemption is implemented in the driver though? Are there actual h/w instructions to as it were "swap stack pointer", "push LDT", "swap instruction pointer"?

Log in

Don't have an account? Sign up now