Gigabyte GeForce GTX 780 Ti WindForce 3X OC review (Page 5)

Gigabyte GeForce GTX 780 Ti WindForce 3X OC review

Graphics cards 1049 Page 5 of 28 Published 2013-11-12 08:30 by Hilbert Hagedoorn

Kepler GK110 Revision B Graphics Architecture

As you can understand, the massive memory partitions, bus-width and combination of GDDR5 memory (quad data rate) allow the GPU to work with a very high framebuffer bandwidth (effective). Let's again put most of the data in a chart to get an idea and better overview of changes:

Graphics card	GeForce GTX 480	GeForce GTX 580	GeForce GTX 680	GeForce GTX 780	GeForce GTX Titan	GeForce GTX 780 Ti
Fabrication node	40nm	40nm	28nm	28nm	28nm	28nm
Shader processors	480	512	1536	2304	2688	2880
Streaming Multiprocessors (SMX)	15	16	8	12	14	15
Texture Units	60	64	128	192	224	240
ROP units	48	48	32	48	48	48
Graphics Clock (Core)	700 MHz	772 MHz	1006/1058 MHz	863/900 MHz	836/876 MHz	875/928 MHz
Shader Processor Clock	1401 MHz	1544 MHz	1006/1058 MHz	863/900 MHz	836/876 MHz	875/928 MHz
Memory Clock / Data rate	924 MHz / 3696 MHz	1000 MHz / 4000 MHz	1502 MHz / 6008 MHz	1502 MHz / 6008 MHz	1502 MHz / 6008 MHz	1750 MHz / 7000 MHz
Graphics memory	1536 MB	1536 MB	2048 MB	3072 MB	6144 MB	3072 MB
Memory interface	384-bit	384-bit	256-bit	384-bit	384-bit	384-bit
Memory bandwidth	177 GB/s	192 GB/s	192 GB/s	288 GB/s	288 GB/s	336 GB/s
Power connectors	1x6-pin PEG, 1x8-pin PEG	1x6-pin PEG, 1x8-pin PEG	2x6-pin PEG	1x6-pin PEG, 1x8-pin PEG	1x6-pin PEG, 1x8-pin PEG	1x6-pin PEG, 1x8-pin PEG
Max board power (TDP)	250 Watts	244 Watts	170 Watts	250 Watts	250 Watts	250 Watts
Recommended Power supply	600 Watts	600 Watts	550 Watts	600 Watts	600 Watts	600 Watts
GPU Thermal Threshold	105 degrees C	97 degrees C	98 degrees C	95 degrees C	95 degrees C	95 degrees C

So we talked about the core clocks, specifications and memory partitions. Obviously there's a lot more to talk through. We feel that to be able to understand a graphics processor, you simply need to break it down into small pieces to better understand it. Let's first look at the raw data that most of you can understand and grasp. This bit will be about the Kepler GK110b architecture, if you're not interested in geek talk, by all means please browse to the next page.

Right so have a close look at the GK110 die as shown above. You'll notice the five green clusters. These are the polymorph GPC engines, each containing 3 SMX (streaming multi processor) clusters, 5 x 3 = 15 SMX clusters in total. You'll spot six 64-bit memory interfaces, bringing in a 384-bit path towards the graphics memory. That's instant extra memory bandwith by the way, combined with a 7 Gbps clock, the cards can reach 336 GB/sec.

So above we see the GK110 block diagram that entails Kepler architecture. Let's break it down into bits and pieces. The GK110B will have:

2880 (GTX 780 Ti) or 2688 (Titan) or 2304 (GTX 780) CUDA processors (Shader cores)
There are 192 CUDA cores (shader processors) per cluster (SMX).

The more important thing to focus on are the SM (block of shader processors) clusters (SMX), which has 192 Shader processors.

SMX: 192 single‐precision CUDA cores, 64 double‐precision units, 32 special function units (SFU), and 32 load/store units.

When we zoom in ever further at one SMX cluster (192 shader processors) we see a change change from the GK104 (GTX 680) as there are 64 double-precision math units.

See the GeForce GTX 680 SMX had 192 single-precision (SP) floating point CUDA Cores, and 8 double-precision (DP) CUDA cores. As a result, DP operations per clock ran at effectively 1/24 the SP rate. This is the same for GTX 780 Ti.

The one exeption remains the GTX Titan, it includes a full 64 DP CUDA Cores per SMX (compared to 192 SP CUDA Cores), or 1/3rd the number of DP cores to SP for substantially more double-precision horsepower. So based on a full 15 SMX 2880 shader cores chip the GK110 has 960 DP units linked to its total of 2,880 CUDA cores, that would be 896 DP units on tested GTX 780 with 12 activated SMXes. Double precision wise, to unlock full performance, you must open the Nvidia Control Panel, navigate to “Manage 3D Settings”. In the Global Settings box you will find an option titled “CUDA – Double Precision” which needs to be enabled, but... GeForce GTX Titan and GTX 780 will run at reduced clock speeds when full double-precision is enabled. Still a great option if you are working on CUDA applications.

The SMX has quite a bit more bite in terms of shader, texture and geometry processing. For GeForce GTX 780 To 192 CUDA cores, that's six times the number of cores per SM opposed to Fermi. In the pipeline we run into the ROP (Raster Operation) engine and the GK110 has 48 engines for features like pixel blending and AA. The GK110 has 64KB of L1 cache for each SMX plus a special 48KB texture unit memory that can be utilized as a read-only cache. L2 cache wise things remain the same across the SMX units compared to the GK104, 1.5MB. The GPU’s Texture units are a valuable resource for compute programs with a need to sample or filter image data. The texture throughput in Kepler is significantly increased compared to Fermi – each SMX unit contains 16 texture filtering units.

GeForce GTX 580 has 16 SMX x 4 Texture units = 64
GeForce GTX 680 has 8 SMX x 16 Texture units = 128
GeForce GTX 780 has 12 SMX x 16 Texture units = 192
GeForce GTX Titan has 14 SMX x 16 Texture units = 224
GeForce GTX 780 Ti has 15 SMX x 16 Texture units = 240

So there's a total 15 SMX x16 TU = 240 texture filtering units available for the GK110 silicon itself (if all SMXes where enabled). Still with me?

Share this content

Twitter Facebook Reddit WhatsApp Email Print

OCZ Vector 150 SSD review

AMD Radeon R9 270 review