As you can understand, the massive memory partitions, bus-width and combination of GDDR5 memory (quad data rate) allow the GPU to work with a very high framebuffer bandwidth (effective). Let's again put most of the data in a chart to get an idea and better overview of changes:
GeForce GTX 480
GeForce GTX 580
GeForce GTX 680
Streaming Multiprocessors (SM)
Graphics Clock (Core)
Shader Processor Clock
Memory Clock / Data rate
924 MHz / 3696 MHz
1000 MHz / 4000 MHz
1502 MHz / 6008 MHz
1x6-pin PEG, 1x8-pin PEG
1x6-pin PEG, 1x8-pin PEG
Max board power (TDP)
Recommended Power supply
GPU Thermal Threshold
105 degrees C
97 degrees C
98 degrees C
So we talked about the core clocks, specifications and memory partitions. Obviously there's a lot more to talk through.
To understand a graphics processor you simply need to break it down into pieces to better understand it. Let's first look at the raw data that most of you can understand and grasp. This bit will be about the Kepler architecture, if you're not interested in g33k talk by all means please browse to the next page.
So above we see the GK104 block diagram that entails the Kepler architecture. Let's break it down into bits and pieces. The GK104 will have:
1536 CUDA processors (Shader cores)
192 CUDA core clusters (SM).
8 geometry units
4 raster Units
128 Texture Units
32 ROP engines
256-bit GDDR5 memory bus
The more important thing to focus on are the SM (block of shader processors) clusters (or SMX as NVIDIA likes to call it for the GTX 680, which has 192 Shader processors. That's radically different from Fermi, the GeForce GTX 580 for example had 32 shader processors per SM cluster. 1536 : 192 = 8 Shader clusters (SMs). Let's blow up one such cluster:
Above the block diagram for a single Shader processor cluster, aka SM or SMX as NVIDIA now calls it. The new SMX has quite a bit more bite in terms of shader, texture and geometry processing. 192 CUDA cores, that's six times the number of cores per SM opposed to Fermi. Now, at the end of the pipeline we run into the ROP (Raster Operation) engine and the GTX 680 again has 32 engines for features like pixel blending and AA.
There's a total of 128 texture filtering units available for the GeForce GTX 680. The math is simple here, each SM has 16 texture units tied to it.
GeForce GTX 580 has 16 SMs X 4 Texture units = 64
GeForce GTX 680 has 8 SMs X 16 Texture units = 128
Above the GK104 host interface - The Gigathread engine, four GPCs, four memory controllers, the ROP partitions, a 768 KB L2 cache. Each GPC has eight polymorph engines - ROP partitions are nearby to the L2 cache, Each shader cluster then is tied to L1 and a shared L2 cache. Shading performance is going be increased quite bit, geometry performance will get a nice boost as well.
NVIDIA is using 64KB Shared Memory/L1 per SMX please note that they have a 16/48 48/16 ratio here for graphics/compute, as before with Fermi. For L2, 128KB per 64-bit memory controller. So that adds up to 512KB L2
In regards to architectural changes, on top of the pipeline NVIDIA has now added new Polymorph 2.0 (world space processing) engines and raster (screen space processing) engines, they act like a mini CPU really.
Gigabyte GeForce GTX 960 WindForce 2X OC review In this review we check out the Gigabyte GeForce GTX 960 WindForce 2X. This product is slightly cheaper opposed to the G1 gaming version, it remains among the most silent of all the cards we tested. T...
Gigabyte GeForce GTX 960 G1 Gaming review In this review we check out the Gigabyte G1 Gaming GeForce GTX 960. This product is gorgeous as it is the most silent of all the cards we tested. Next to some fantastic looks the product comes factory...
Gigabyte GTX 980 WATERFORCE 3-Way SLI We review Gigabyte GeForce GTX 980 Gigabyte WaterForce, a 3-way SLI review on the GeForce GTX 980 that has been water-cooled by a complete AOI kit. We'll FLIR them, overclock them and look at Ultra ...
Gigabyte X99 UD4 Motherboard Review In this review we check out the Gigabyte X99 UD4, it is the more affordable X99 motherboard in the Gigabyte range and we have had our hands on. It has a nice feature set like SLI/Crossfire support, h...