Guru3D.com
  • HOME
  • NEWS
    • Channels
    • Archive
  • DOWNLOADS
    • New Downloads
    • Categories
    • Archive
  • GAME REVIEWS
  • ARTICLES
    • Rig of the Month
    • Join ROTM
    • PC Buyers Guide
    • Guru3D VGA Charts
    • Editorials
    • Dated content
  • HARDWARE REVIEWS
    • Videocards
    • Processors
    • Audio
    • Motherboards
    • Memory and Flash
    • SSD Storage
    • Chassis
    • Media Players
    • Power Supply
    • Laptop and Mobile
    • Smartphone
    • Networking
    • Keyboard Mouse
    • Cooling
    • Search articles
    • Knowledgebase
    • More Categories
  • FORUMS
  • NEWSLETTER
  • CONTACT

New Reviews
Intel NUC 13 Pro (Arena Canyon) review
Endorfy Arx 700 Air chassis review
Beelink SER5 Pro (Ryzen 7 5800H) mini PC review
Crucial T700 PCIe 5.0 NVMe SSD Review - 12GB/s
Sapphire Radeon RX 7600 PULSE review
Gainward GeForce RTX 4060 Ti GHOST review
Radeon RX 7600 review
ASUS GeForce RTX 4060 Ti TUF Gaming review
MSI GeForce RTX 4060 Ti Gaming X TRIO review
GeForce RTX 4060 Ti 8GB (FE) review

New Downloads
AMD Radeon Software Adrenalin 23.5.2 WHQL download
Intel ARC graphics Driver Download Version: 31.0.101.4382
CrystalDiskInfo 9.0.1 Download
Corsair Utility Engine Download (iCUE) Download v5.2
GeForce 535.98 WHQL driver download
CPU-Z download v2.06
AMD Radeon Software Adrenalin 23.5.1 WHQL download
GeForce 532.03 WHQL driver download
AMD Chipset Drivers Download 5.05.16.529
Display Driver Uninstaller Download version 18.0.6.4


New Forum Topics
NVIDIA GeForce Game Ready 532.03 WHQL Download & Discussion NVIDIA GeForce Hotfix Driver 536.09 AI drone tried to remove its human operator from commanding - under simulation RTX 4090 Owner's thread AMD Software: Adrenalin Edition 23.5.2 - Driver Download and Discussion LG Unveils the UltraGear 27GR75Q-B: A High-Performance 27-inch WQHD Gaming Monitor Replacing the RX6600 XT Aliexpress "3070M" Desktop card driver Ryzen 7800X3D ... MSI B650 Edge (voltage?) AMD Software: Adrenalin Edition 23.5.1 - Driver Download and Discussion




Guru3D.com » Review » MSI GeForce GTX 970 Gaming OC review » Page 5

MSI GeForce GTX 970 Gaming OC review - Maxwell Graphics Architecture

by Hilbert Hagedoorn on: 09/19/2014 04:31 AM [ 5] 70 comment(s)

Tweet

Maxwell Graphics Architecture

As you can understand, the massive memory partitions, bus-width and combination of GDDR5 memory (quad data rate) allows the GPU to work with a very high framebuffer bandwidth (effective). Let's again put most of the data in a chart to get an idea and better overview of changes:

GeForce GTX 780 GTX Titan GTX 780 Ti GTX Titan Black GTX 970 GTX 980
Fabrication node 28nm 28nm 28nm 28nm 28nm 28nm
Shader processors 2304 2688 2880 2880 1664 2048
Streaming Multiprocessors (SMX) 12 14 15 15 13 16
Texture Units 192 224 240 240 104 128
ROP units 48 48 48 48 56 64
GPU Clock (Core/Boost) 863/900 836/876 875/928 889/980 1050/1178 1126/1216
Memory Clock / Data rate 1502/6008 1502/6008 1750/7000 1750/7000 1750/7000 1750/7000
Graphics memory 3072 6144 3072 6144 4096 4096
Memory interface 384-bit 384-bit 384-bit 384-bit 256-bit 256-bit
Memory bandwidth 288 GB/s 288 GB/s 336 GB/s 336 GB/s 224 GB/s 224 GB/s
Power connectors 1x6-pin PEG, 1x8-pin PEG 1x6-pin PEG, 1x8-pin PEG 1x6-pin PEG, 1x8-pin PEG 1x6-pin PEG, 1x8-pin PEG 2x6-pin PEG 2x6-pin PEG
Max board power (TDP) 250 Watts 250 Watts 250 Watts 250 Watts 145 Watts 165Watts
Recommended Power supply 600 Watts 600 Watts 600 Watts 600 Watts 500 Watts 500 Watts
GPU Thermal Threshold 95 degrees C 95 degrees C 95 degrees C 95 degrees C 95 degrees C 95 degrees C

So we talked about the core clocks, specifications and memory partitions. Obviously there's a lot more to talk through. We feel that to be able to understand a graphics processor, you simply need to break it down into small pieces to better understand it. Let's first look at the raw data that most of you can understand and grasp. This bit will be about the Maxwell GM204 architecture. NVIDIA’s  “Maxwell” GPU architecture implements a number of architectural enhancements designed to extract even more performance and more power efficiency per watt consumed. The first Maxwell-based GPU was codenamed “GM107” (GTX 750 and 750 Ti). Overall the architecture was designed with power-limited environments in mind. These products are now faster than the 780 cards, yet only consume 165 Watts. Actually, this small statement sums things up nicely, there is 40% more performance delivered per processor over the last gen architecture.

 

Right, so have a close look at the die as shown above. You'll notice the green clusters. These are the GPC engines; each GPC has 4 SMX/SMM (streaming multi processor) clusters with 32 streaming core processing blocks in total. You'll spot four 64-bit memory interfaces, bringing in a 256-bit path to the graphics memory. At 7 Gbps default that's instant extra memory bandwidth by the way, the cards can reach 224 GB/sec.
 

 


So above, we see the GM204 block diagram that entails the Maxwell architecture, Nvidia started developing the GPU around 2011 actually. Let's break it down into bits and pieces. The GM204 will have:

  • 1664 (GTX 970) or 2048 (GTX 980) CUDA/Shader/Stream processors are used
  • There are 128 CUDA cores (shader processors) per cluster (SMM)
  • 5.2 Billion Transistors
  • 2x Performance of the GK104
  • 16 SMM
  • 16 Geometry units
  • 104 / 128 Texture units
  • 64 ROP units
  • 256-bit GDDR5
An important thing to focus on is the SM (block of shader processors) clusters (SMX), which has 128 Shader processors. Let's zoom in even further.
 

One SMX: 128 single‐precision CUDA cores, double‐precision units, special function units (SFU), and load/store units.
 

So based on a full 16 SMM 2048 shader cores chip the SMX looks fairly familiar in design, in the pipeline we run into the ROP (Raster Operation) engine and the GM204 has a nice 64 engines for features like pixel blending and AA. The GPU has 64KB of L1 cache for each SMX plus a special 48KB texture unit memory that can be utilized as a read-only cache. L2 cache wise the GPU has 2 MB. The GPU’s Texture units are a valuable resource for compute programs with a need to sample or filter image data. The texture throughput is significantly decreased compared to Fermi – each SMX unit contains 8 texture filtering units.
  • GeForce GTX 780 has 12 SMX x 16 Texture units = 192
  • GeForce GTX 970 has 13 SMX x 8 Texture units = 104
  • GeForce GTX 980 has 16 SMX x 8 Texture units = 128

So there's a total of up-to 16 SMX x8 TU = 128 texture filtering units available for the silicon itself (once all SMXes were enabled). typically lower is worse, but these cards however require little voltage and can be clocked very high. And that's where performance kicks in at low power consumption.

To reduce DRAM bandwidth demands, NVIDIA GPUs make use of lossless compression techniques as data is written out to memory. The bandwidth savings from this compression is realized a second time when clients such as the Texture Unit later read the data. As illustrated in the preceding figure, our compression engine has multiple layers of compression algorithms.

Any block going out to memory will first be examined to see if 4x2 pixel regions within the block are constant, in which case the data will be compressed 8:1 (i.e., from 256B to 32B of data, for 32b color). If that fails, but 2x2 pixel regions are constant, they will compress the data 4:1. These modes are effective for AA surfaces, but less so for 1xAA rendering. Therefore, starting in Fermi Nvidia also implemented support for a “delta color compression” mode. In this mode, they calculate the difference between each pixel in the block and its neighbor, and then try to pack these different values together using the minimum number of bits. For example if pixel A’s red value is 253 (8 bits) and pixel B’s red value is 250 (also 8 bits), the difference is 3, which can be represented in only 2 bits. If the block cannot be compressed in any of these modes, then the GPU will write out data uncompressed, preserving the lossless rendering requirement.

The effectiveness of delta color compression depends on the specifics of which pixel ordering is chosen for the delta color calculation. Maxwell contains the third generation of delta color compression, which improves effectiveness by offering more choices of delta calculation to the compressor. Thanks to the improvements in caching and compression in Maxwell, the GPU is able to significantly reduce the number of bytes that have to be fetched from memory per frame. Maxwell uses roughly 25% fewer bytes per frame compared to Kepler. This means that from the perspective of the GPU core, a Kepler-style memory system running at 9.3Gbps would provide effective bandwidth similar to the bandwidth that Maxwell’s enhanced memory system provides.




27 pages « < 4 5 6 7 next »



Related Articles
MSI GeForce RTX 4060 Ti Gaming X TRIO review
MSI unveils the latest addition to their lineup, the MSI GeForce RTX 4060 Ti Gaming X TRIO. This graphics card boasts an impressive 8GB of VRAM and is equipped with a substantial cooling system and fa...

MSI GeForce RTX 4070 Gaming X TRIO review
We evaluate the MSI GeForce RTX 4070 Gaming X TRIO, an exceptional choice within the RTX 4070 range, known for its outstanding performance. Equipped with 12 GB of graphics memory and a triple-fan cool...

MSI GeForce RTX 4070 Ventus 3X review
We review the MSI GeForce RTX 4070 Ventus 3X, an affordable alternative in the RTX 4070 lineup that delivers top-notch performance. Classified by NVIDIA as a 12 GB MSRP (Manufacturer's Suggested Reta...

MSI GeForce RTX 4070 Ti Gaming X TRIO review
MSI has stepped up with their Gaming X TRIO GeForce RTX 4070 Ti. It is factory-tweaked (slightly) but looks great. It comes with whisper-quiet cooling, and is quite impressive in all respects....

© 2023