Wednesday, October 1, 2014

Nvidia Maxwell and GTX 970 reviewed: Crushing all challengers


The GTX 980 and 970 are the Big Billy Goats Gruff to their smaller budget cousin, the GTX 750 Ti. When the first Maxwell GPU arrived this spring, it was clear that Nvidia had something potent on its hands. Maxwell showed enormous promise, leaping over Kepler’s compute performance in multiple benchmarks. Swift price cuts from AMD took the wind out of the 750 Ti’s launch position, but it was clear that this new core was a warning shot. As of now, Nvidia is firing both barrels.

Tonight, Nvidia is launching their next generation Maxwell architecture alongside a slew of new software announcements and product features. With limited time to cover both sides of the equation, Sebastian and I split the difference — this article will focus on the GTX 980 and GTX 970′s performance, while Seb talks about new antialiasing methods, resolution upscaling, hardware-accelerated voxels, and Maxwell’s other new features.

The GM204 core

We’ll kick things off with a review of Maxwell’s core architecture and features. Some of this information was covered in the GTX 750 Ti review, so refer back to that in-depth discussion if you want more detail. GM204 (GTX 970 & 980) keeps the same organizational structure as GTX 750 Ti, but scales the entire architecture up. The Maxwell GPU is significantly more fine-grained than Kepler, and should be easier to program. When Nvidia shifted to GK104/GK110, it vastly increased the amount of instruction-level parallelism (ILP) that programs needed to offer in order to perform well on Kepler’s architecture. Maxwell moves the ball back in the other direction.
The following chart shows how Maxwell (GTX 980) compares to Kepler (GTX 680).
Maxwell-Chart
There are a few important differences we want to call out, as detailed below:
Twice the front-ends: One reason that Kepler’s compute performance was so much lower than AMD’s was the low number of front-end processors in the GPU. GK110 improved the situation by adding up to 15 SMXs; Maxwell’s GM204 actually has one additional SMM block for a total of 16.
Twice the Render Outputs (ROPs): Another feature that favored AMD’s high-end Hawaii cards was the increased number of render outputs, or ROPs). Nvidia’s mainstream enthusiast parts topped out at 32 while the GTX 780 family had 48. Hawaii, in contrast, had 64 — as does the GTX 980.
Larger die, higher density: Maxwell is still built on 28nm, like Kepler, but Nvidia still increased total transistor density. AMD’s Tonga (R9 285) appears to be the densest reasonably high-end GPU on 28nm; it’s unlikely we’ll see any significant improvements until 20nm hardware arrives next year.
TransistorDensity
The first change will improve Maxwell’s execution efficiency and compute performance while the second makes the GPU more competitive in 4K mode. Then, on top of these changes, there’s the increased core counts, transistor density, a significantly larger die — and reduced TDP. Pulling down to 165W is a major achievement considering that this is the full consumer version of the chip (it’s assumed that Nvidia will roll out a workstation/Tesla-class part at a later date).
The sharp eyed among you will note that Maxwell’s memory bus isn’t any faster than current top-end GTX 770 cards — falling back to the 680 for this comparison lets Nvidia show a modest improvement that isn’t really reflected in hardware since the 680 hasn’t been top end for quite some time. Fortunately, Maxwell has some significant under the hood improvements in the memory efficiency department.

Doing more with less

One of the major improvements Maxwell makes to Nvidia’s GPU architecture is a third-generation color compression algorithm that can dramatically reduce memory bandwidth consumption. Nvidia is claiming that the real-world reduction in memory bandwidth use for shipping titles is 17-25%.
Maxwell-Memory
Nvidia had good reason to focus on improving memory bandwidth utilization as opposed to just expanding the total bus. A GPU’s memory bus tends to ring the outside of the core and necessitates larger dies and higher production costs. Historically, both AMD and Nvidia have focused on keeping the bus as small as possible — in some cases, GPU architectures debut with large but less-efficient bus structures, then transition to more efficient designs (G80/G92b for Nvidia, R600/RV670 for AMD). In both cases, the newer cards equaled or outperformed their older counterparts, despite having smaller memory buses.

No comments:

Post a Comment