Gigabyte GeForce GTX 950 Xtreme Gaming review

Graphics cards 1022 Page 5 of 32 Published by


Maxwell Graphics Architecture

Maxwell Graphics Architecture

Let's again put most of the data in a chart to get an idea and better overview of changes:

GeForce (reference) GTX 780 GTX 780 Ti GTX Titan Black GTX 950 GTX 960 GTX 970 GTX 980
Fabrication node 28 nm 28 nm 28 nm 28 nm 28 nm 28 nm 28 nm
Shader processors 2304 2880 2880 768 1024 1664 2048
Streaming Multiprocessors (SMX) 12 15 15 6 8 13 16
Texture Units 192 240 240 48 64 104 128
ROP units 48 48 48 32 32 56 64
GPU Clock (Core/Boost) 863/900 875/928 889/980 1064/1241 1127/1178 1050/1178 1126/1216
Memory Clock / Data rate 1502/6008 1750/7000 1750/7000 1650/6600 1750/7000 1750/7000 1750/7000
Graphics memory 3072 3072 6144 2048 2048/4096  4096 4096
Memory interface 384-bit 384-bit 384-bit 128-bit 128-bit 256-bit 256-bit
Memory bandwidth 288 GB/s 336 GB/s 336 GB/s 105 GB/s 112 GB/s 224 GB/s 224 GB/s
Power connectors 1x6-pin, 1x8-pin 1x6-pi, 1x8-pin 1x6-pin 1x8-pin 1x6-pin 1x 8-pin 2x6-pin 2x6-pi
Max board power (TDP) 250 Watts 250 Watts 250 Watts 90 Watts 120 Watts 145 Watts 165Watts
Recommended Power supply 600 Watts 600 Watts 600 Watts 400 Watts 450 Watts 500 Watts 500 Watts
GPU Thermal Threshold 95 degrees C 95 degrees C 95 degrees C 95 degrees C 95 degrees C 95 degrees C 95 degrees C

So we talked about the core clocks, specifications and memory partitions. However, to be able to better understand a graphics processor you simply need to break it down into small pieces. Let's first look at the raw data that most of you can understand and grasp. This bit will be about the Maxwell architecture. NVIDIA’s “Maxwell” GPU architecture implements a number of architectural enhancements designed to extract even more performance and more power efficiency per watt consumed.


So, above we see the GM206 block diagram that entails the Maxwell architecture, Nvidia started developing the GPU around 2011/2014 actually. Each of the two GPCs has eight SMX/SMM (streaming multi-processor) clusters in total. You'll spot the two 64-bit memory interfaces, bringing in a 128-bit path to the graphics memory at 7 Gbps, by default the cards can reach 112 GB/sec (GTX 960).

Let's break it down into bits and pieces. The GM206 will have:

  • 768 (GTX 950) CUDA/Shader/Stream processors are used
  • There are 128 CUDA cores (shader processors) per cluster
  • 2.94 Billion Transistors
  • 48 Texture units
  • 32 ROP units
  • 128-bit GDDR5 @ 112 GB/s
  • Texture Filtering Rate (Bilinear) 72.1 GigaTexels/sec
An important thing to focus on is the SM (block of shader processors) clusters (SMX), which have 128 shader processors. Let's zoom in even further.
One SMX: 128 single‐precision shader cores, double‐precision units, special function units (SFU), and load/store units.

So based on a 6 SMM 768 shader core chip the SMX looks fairly familiar in design. In the pipeline we run into the ROP (Raster Operation) engine and the GM206 has a nice 32 engines for features like pixel blending and AA.
Each Maxwell SM features its own dedicated 96KB shared memory, while the L1/texture caching functions have been combined into a 24KB pool of memory per pair of processing blocks (48KB per SMM). GM206 also ships with 1MB of L2 cache that’s shared across the GPU. With more built-in cache, fewer requests to graphics DRAM are needed, this improves performance and reduces power consumption. In addition, our third-generation delta color compression engine offers new modes for color compression, allowing the GPU to more effectively use its available memory bandwidth.
The texture throughput is significantly decreased compared to Fermi – each SMX unit contains 8 texture filtering units.
  • GeForce GTX 950 has 6 SMX x 8 Texture units = 48
  • GeForce GTX 960 has 8 SMX x 8 Texture units = 64
  • GeForce GTX 970 has 13 SMX x 8 Texture units = 104
  • GeForce GTX 980 has 16 SMX x 8 Texture units = 128

So there's a total of up-to 6 SMX x 8 TU = 48 texture filtering units available for the silicon itself. Typically lower is worse, but these cards however require little voltage and can be clocked very high. And that's where performance kicks in at low power consumption. To reduce DRAM bandwidth demands, NVIDIA GPUs make use of lossless compression techniques as data is written out to memory. The bandwidth savings from this compression are realized a second time when clients such as the Texture Unit later read the data. As illustrated in the preceding figure, the compression engine has multiple layers of compression algorithms.

Any block going out to memory will first be examined to see if 4x2 pixel regions within the block are constant, in which case the data will be compressed 8:1 (i.e., from 256B to 32B of data, for 32b color). If that fails, but 2x2 pixel regions are constant, they will compress the data 4:1. These modes are effective for AA surfaces, but less so for 1xAA rendering. Therefore, starting in Fermi Nvidia also implemented support for a “delta color compression” mode. In this mode, they calculate the difference between each pixel in the block and its neighbour, and then try to pack these different values together using the minimum number of bits. For example if pixel A’s red value is 253 (8 bits) and pixel B’s red value is 250 (also 8 bits), the difference is 3, which can be represented in only 2 bits. If the block cannot be compressed in any of these modes, then the GPU will write out data uncompressed, preserving the lossless rendering requirement.

The effectiveness of delta color compression depends on the specifics of which pixel ordering is chosen for the delta color calculation. Maxwell contains the third generation of delta color compression, which improves effectiveness by offering more choices of delta calculation to the compressor. Thanks to the improvements in caching and compression in Maxwell, the GPU is able to significantly reduce the number of bytes that have to be fetched from memory per frame. Maxwell uses roughly 25% fewer bytes per frame compared to Kepler.

The GeForce GTX 950’s GM206 GPU ships with a new video engine that natively supports H.265 (HEVC) encode and decode in hardware, while the GTX 950’s display engine supports up to four displays at up to 5K (5120x3200) resolution.

Share this content
Twitter Facebook Reddit WhatsApp Email Print