ASUS GeForce GTX 670 DirectCU Mini review (Page 2)

ASUS GeForce GTX 670 DirectCU Mini review

Graphics cards 1049 Page 2 of 24 Published 2013-05-10 09:44 by Hilbert Hagedoorn

The Technology and Specs

Reference technology and specs

We'll first look at the reference (original design) based specs and architecture. The GeForce GTX 670 is based on the a Kepler GPU architecture, the very same 28nm GK104 GPU which is used on the GeForce GTX 680. The GeForce GTX 670 boasts 1344 CUDA (shader) cores whereas the GeForce GTX 680 has 1536 CUDA (shader) cores. That's 192 Shader cores less, and that's precisely one CUDA core clusters (SM) less out of the eight available. The product is obviously PCI-Express 3.0 ready and has a TDP of around 170 Watt (with a typical draw of 150~160W). But let me first show you GK104 die:

GeForce GTX 680
NVIDIA GK104 Kepler architecture GPU, you can see the eight SM (CUDA/shader core) clusters, one of these has been deactivated for the GTX 670.

An immediate difference to the GPU core versus the shader processor domain is that both will be clocked at 1:1, meaning both the core and shader domain clock in at 915 MHz. The boost clock for the reference GTX 670 cards is set at 980 MHz though that can vary a bit per card and available power envelope (topping 1 GHz is common). As far as the memory specs of the GK104 Kepler GPU are concerned, the boards will feature a 256-bit memory bus connected to 2 GB of GDDR5 video buffer memory. On the memory controller side of things you'll see very significant improvements as the reference memory clock is set at 6 GHz / Gbps. This boils down to to a memory bandwidth of 192 GB/s on that 256-bit memory bus. These graphics adapters are of course DirectX 11.1 ready. With Windows 8, 7 and Vista also being DX11 ready all we need are more new games to take advantage of DirectCompute, multi-threading, hardware tessellation and the latest shader 5.0 extensions.

For your reference here's a quick overview of some past generation high-end GeForce cards opposed to the new Kepler based GeForce GTX 680.

	GeForce GTX 480	GeForce GTX 580	GeForce GTX 670	ASUS GTX 670 DCU Mini	GeForce GTX 680	GeForce GTX 690
Stream (Shader) Processors	480	512	1344	1344	1536	3072
Core Clock (MHz)	700	772	915	928	1006	915
Shader Clock (MHz)	1400	1544	-	-	-	-
Boost clock (Mhz)	-	-	980	1033	1058	1019
Memory Clock (effective MHz)	3700	4000	6008	6008	6008	6008
Memory amount	1536	1536	2048	2048	2048	4096
Memory Interface	384-bit	384-bit	256-bit	256-bit	256-bit	256-bit
Memory Type	GDDR5	GDDR5	GDDR5	GDDR5	GDDR5	GDDR5

For Kepler, NVIDIA kept their memory controllers GDDR5 compatible. Memory wise NVIDIA has nice large memory volumes due to their architecture, we pass 2 GB as standard these days for most of NVIDIA's series 600 graphics cards in the high range spectrum. The hardware engineers of NVIDIA reworked the memory subs system quite a bit, enabling much higher memory clock frequency speeds compared to previous generation GeForce GPUs. The result is this memory speeds up-to 6 Gbps. Each memory partition utilizes one memory controller on the respective GPU, which will get 256/512 MB of memory tied to it.

The GTX 580 has six memory controllers (6x256MB) = 1536 MB of GDDR5 memory
The GTX 670 has four memory controllers (4x512MB) = 2048 MB of GDDR5 memory
The GTX 680 has four memory controllers (4x512MB) = 2048 MB of GDDR5 memory

As mentioned in the introduction, a 4 GB version would be very possible as well. It all depends on the board partners.

The graphics architecture that is Kepler

As you can understand, the massive memory partitions, bus-width and combination of GDDR5 memory (quad data rate) allow the GPU to work with a very high framebuffer bandwidth (effective). Let's again put most of the data in a chart to get an idea and better overview of changes:

Graphics card	GeForce GTX 580	GeForce GTX 670	ASUS GTX 670 DCU Moni	GeForce GTX 680	GeForce GTX 690
Fabrication node	40nm	28nm	28nm	28nm	28nm
Shader processors	512	1344		1536	3072
Streaming Multiprocessors (SM)	16	7	7	8	16
Texture Units	64	112	112	128	128x2
ROP units	48	32	32	32	32x2
Graphics Clock (Core)	772 MHz	915 / 980MHz	928/ 1033 MHz	1006/1058MHz	915/1019MHz
Shader Processor Clock	1544 MHz	915 / 980MHz	/ 1033 MHz	1006/1058MHz	915/1019MHz
Memory Clock / Data rate MHz	1000 / 4000	1502 / 6008 MHz	1502 / 6008 MHz	1502 / 6008 MHz	1502 / 6008 MHz
Graphics memory	1536 MB	2048 MB	2048 MB	2048 MB	4096 MB
Memory interface	384-bit	256-bit	256-bit	256-bit	256-bit
Memory bandwidth	192 GB/s	192 GB/s	192 GB/s	192 GB/s	192 GB/s
Power connectors	1x6-pin PEG, 1x8-pin PEG	2x6-pin PEG	2x6-pin PEG	2x6-pin PEG	2x8-pin PEG
Max board power (TDP)	244 Watts	170 Watts	180 Watts	170 Watts	300 Watts
Recommended Power supply	600 Watts	550 Watts	550 Watts	550 Watts	750 Watts
GPU Thermal Threshold	97 degrees C	98 degrees C	98 degrees C	98 degrees C	98 degrees C

So we talked about the core clocks, specifications and memory partitions. Obviously there's a lot more to talk through the GPU architecture for example. To understand a graphics processor you simply need to break it down into pieces to better understand it. Let's first look at the raw data that most of you can understand and grasp. This bit will be about the Kepler architecture, if you're not interested in g33k talk by all means please browse to the next page.

GeForce GTX 680

So above we see the GK104 block diagram that entails the Kepler architecture. Let's break it down into bits and pieces. A fully operating GK104 will have:

1536 CUDA processors (Shader cores)
192 CUDA core clusters (SM).
8 geometry units
4 raster Units
128 Texture Units
32 ROP engines
256-bit GDDR5 memory bus
DirectX 11.1

Above thus a fully operating GK104 as used on the GTX 680. The GTX 670 uses the same chip, but has one SM (CUDA / Shader core cluster) disabled. So the more important thing to focus on are the SM (block of shader processors) clusters (or SMX as NVIDIA likes to call it for the GTX 680, which has 192 Shader processors. That's radically different from Fermi, the GeForce GTX 580 for example had 32 shader processors per SM cluster. 1536 : 192 = 8 Shader clusters (SMs). Let's blow up one such cluster:

GeForce GTX 680

Above the block diagram for a single Shader processor cluster, aka SM or SMX as NVIDIA now calls it. The new SMX has quite a bit more bite in terms of shader, texture and geometry processing. 192 CUDA cores, that's six times the number of cores per SM opposed to Fermi. Now, at the end of the pipeline we run into the ROP (Raster Operation) engine and the GTX 680 again has 32 engines for features like pixel blending and AA. There's a total of 128 texture filtering units available for the GeForce GTX 680. The math is simple here, each SM has 16 texture units tied to it.

GeForce GTX 580 has 16 SMs X 4 Texture units = 64
GeForce GTX 670 has 7 SMs X 16 Texture units = 112
GeForce GTX 680 has 8 SMs X 16 Texture units = 128

Above the GK104 host interface - The Gigathread engine, four GPCs, four memory controllers, the ROP partitions, a 768 KB L2 cache. Each GPC has eight polymorph engines - ROP partitions are nearby to the L2 cache, Each shader cluster then is tied to L1 and a shared L2 cache. Shading performance is going be increased quite bit, geometry performance will get a nice boost as well. NVIDIA is using 64KB Shared Memory/L1 per SMX please note that they have a 16/48 48/16 ratio here for graphics/compute, as before with Fermi. For L2, 128KB per 64-bit memory controller. So that adds up to 512KB L2

In regards to architectural changes, on top of the pipeline NVIDIA has now added new Polymorph 2.0 (world space processing) engines and raster (screen space processing) engines, they act like a mini CPU really.

Share this content

Twitter Facebook Reddit WhatsApp Email Print

OCZ Vertex 3.20 SSD review

Noctua NH-U12S and NH-U14S review