Guru3D.com
  • HOME
  • NEWS
    • Channels
    • Archive
  • DOWNLOADS
    • New Downloads
    • Categories
    • Archive
  • GAME REVIEWS
  • ARTICLES
    • Rig of the Month
    • Join ROTM
    • PC Buyers Guide
    • Guru3D VGA Charts
    • Editorials
    • Dated content
  • HARDWARE REVIEWS
    • Videocards
    • Processors
    • Audio
    • Motherboards
    • Memory and Flash
    • SSD Storage
    • Chassis
    • Media Players
    • Power Supply
    • Laptop and Mobile
    • Smartphone
    • Networking
    • Keyboard Mouse
    • Cooling
    • Search articles
    • Knowledgebase
    • More Categories
  • FORUMS
  • NEWSLETTER
  • CONTACT

New Reviews
DeepCool LS720 (LCS) review
Fractal Design Pop Air RGB Black TG review
Palit GeForce GTX 1630 4GB Dual review
FSP Dagger Pro (850W PSU) review
Razer Leviathan V2 gaming soundbar review
Guru3D NVMe Thermal Test - the heatsink vs. performance
EnGenius ECW220S 2x2 Cloud Access Point review
Alphacool Eisbaer Aurora HPE 360 LCS cooler review
Noctua NH-D12L CPU Cooler Review
Silicon Power XPOWER XS70 1TB NVMe SSD Review

New Downloads
Prime95 download version 30.9 build 1
Intel ARC graphics Driver Download Version: 30.0.101.1743
AMD Radeon Software Adrenalin 22.6.1 WHQL driver download
GeForce 516.59 WHQL driver download
Media Player Classic - Home Cinema v1.9.22 Download
AMD Chipset Drivers Download v4.06.10.651
CrystalDiskInfo 8.17 Download
AMD Radeon Software Adrenalin 22.6.1 Windows 7 driver download
ReShade download v5.2.2
HWiNFO Download v7.26


New Forum Topics
be quiet! Launches Silent Wings 4 and Silent Wings Pro 4 Fans NVIDIA GeForce 516.59 WHQL driver download & Discussion Sharkoon Launches PureWriter RGB White AMD Radeon Software Adrenalin 22.6.1 - Driver download and discussion Can you measure if a CPU was used before? FSR Thread Foundry TSMC states prices of graphics cards and processors will increase by 9% AMD Radeon Software - UWP ASUS ROG Swift OLED PG48 UQ specs disclose 4K organic EL display compatible with 138Hz / 0.1ms. Ubisoft is cutting off online gameplay for 15 games, players will no longer have access to purchased DLC




Guru3D.com » Review » Gigabyte GeForce GTX 980 Ti G1 Gaming SOC Review » Page 6

Gigabyte GeForce GTX 980 Ti G1 Gaming SOC Review - Maxwell GPU Architecture

by Hilbert Hagedoorn on: 06/11/2015 01:39 PM [ 5] 181 comment(s)

Tweet

Maxwell Graphics Architecture

Let's place the more important data on the GPU into a chart to get an idea and better overview of changes in terms of architecture like shaders, rops and where we are at frequencies wise:
 

GeForceGTX Titan X GTX Titan BlackGTX 980 TiGTX 980
GPU GM200 GK110B GM200 GM204
Architecture Maxwell Kepler Maxwell Maxwell
Transistor count 8 Billion 7.1 Billion 8 Billion 5.2 Billion
Fabrication Node TSMC 28 nm TSMC 28 nm TSMC 28 nm TSMC 28 nm
CUDA Cores 3072 2880 2816 2048
SMMs / SMXs 24 15 22 16
ROPs 96 48 96 64
FP64 1/32 FP32 1/3 FP32 1/32 FP32 1/32 FP32
GPU Clock Core / Boost 1002 MHz / 1076 MHz 889 MHz / 980 MHz 1002 MHz / 1076 MHz 1127 MHz / 1216 MHz
Memory Clock 1753 MHz 1753 MHz 1753 MHz 1753 MHz
Memory Size 12 GB 6 GB 6 GB 4 GB
Memory Bus 384-bit 384-bit 384-bit 256-bit
Memory Bandwidth 337 GB/s 337 GB/s 337 GB/s 224 GB/s
FP Performance 7.0 TFLOPS  5.20 TFLOPS 6.4 TFLOPS  4.61 TFLOPS
GPU Thermal Threshold 91 Degrees C 95 Degrees C 91 Degrees C 95 Degrees C
TDP 250 Watt 250 Watt 250 Watt 165 Watt
Launch MSRP $999 $999 TBA $549


So we talked about the core clocks, specifications and memory partitions. However, to be able to better understand a graphics processor you simply need to break it down into small pieces. Let's first look at the raw data that most of you can understand and grasp. This bit will be about the Maxwell GM200 architecture. NVIDIA’s “Maxwell” GPU architecture implements a number of architectural enhancements designed to extract even more performance and more power efficiency per watt consumed.
 


So above, we see the GM200 block diagram that entails the Maxwell architecture, Nvidia started developing the Maxwell architecture around 2011/2012 already. Each of the GPCs has eight SMX/SMM (streaming multi-processor) clusters in total. You'll spot six 64-bit memory interfaces, bringing in a 386-bit path to the graphics memory at 7 Gbps.

Let's break it down into bits and pieces. A fully enabled GM200 GPU will have (again fully enabled, the GTX 980 Ti for example is slightly cut down):

  • 3072 CUDA/Shader/Stream processors
  • There are 128 CUDA cores (shader processors) per cluster
  • Over 8 Billion Transistors
  • 256 Texture units
  • 96 ROP units
  • 3MB L2 cache
  • 384-bit GDDR5
What about double-precision? Ehm, sorry it's dumbed down to not interfere with Quadro sales -- overall double-precision instruction throughput is 1/32 the rate of single-precision instruction throughput. An important thing to focus on is the SM (block of shader processors) clusters (SMX), which have 128 shader processors. Let's zoom in even further.
 
 


One SMX: 128 single‐precision shader cores, double‐precision units, special function units (SFU), and load/store units. 
So based on a full 24 SMM 3072 shader core chip the SMX looks fairly familiar in design. In the pipeline we run into the ROP (Raster Operation) engine and the GM200 has a nice 96 engines for features like pixel blending and AA. The GPU has 64 KB of L1 cache for each SMX plus a special 48 KB texture unit memory that can be utilized as a read-only cache. The GPU’s texture units are a valuable resource for compute programs with a need to sample or filter image data. The texture throughput is significantly decreased compared to Fermi – each SMX unit contains 8 texture filtering units.
  • GeForce GTX 960 has 8 SMX x 8 Texture units = 64
  • GeForce GTX 970 has 13 SMX x 8 Texture units = 104
  • GeForce GTX 980 has 16 SMX x 8 Texture units = 128
  • GeForce GTX 980 Ti has 22 SMX x 8 Texture units = 176
  • GeForce GTX Titan X has 24 SMX x 8 Texture units = 192

So there's a total of up-to 22 SMX x 8 TU = 176 texture filtering units available for the silicon itself (once all SMXes are enabled).

Typically lower is worse, but these cards however require little voltage and can be clocked very high. And that's where performance kicks in at low power consumption. To reduce DRAM bandwidth demands, NVIDIA GPUs make use of lossless compression techniques as data is written out to memory. The bandwidth savings from this compression are realized a second time when clients such as the Texture Unit later read the data. As illustrated in the preceding figure, the compression engine has multiple layers of compression algorithms. Any block going out to memory will first be examined to see if 4x2 pixel regions within the block are constant, in which case the data will be compressed 8:1 (i.e., from 256B to 32B of data, for 32b color). If that fails, but 2x2 pixel regions are constant, they will compress the data 4:1. These modes are effective for AA surfaces, but less so for 1xAA rendering. Therefore, starting in Fermi Nvidia also implemented support for a “delta color compression” mode. In this mode, they calculate the difference between each pixel in the block and its neighbor, and then try to pack these different values together using the minimum number of bits. For example if pixel A’s red value is 253 (8 bits) and pixel B’s red value is 250 (also 8 bits), the difference is 3, which can be represented in only 2 bits. If the block cannot be compressed in any of these modes, then the GPU will write out data uncompressed, preserving the lossless rendering requirement. The effectiveness of delta color compression depends on the specifics of which pixel ordering is chosen for the delta color calculation. Maxwell contains the third generation of delta color compression, which improves effectiveness by offering more choices of delta calculation to the compressor. Thanks to the improvements in caching and compression in Maxwell, the GPU is able to significantly reduce the number of bytes that have to be fetched from memory per frame. Maxwell uses roughly 25% fewer bytes per frame compared to Kepler.




36 pages « < 5 6 7 8 next »



Related Articles
Gigabyte GeForce RTX 3090 Ti Gaming OC review
Gigabyte has released their GeForce RTX 3090 'Ti' Gaming OC. The new flagship was fitted with faster memory, a boost frequency of 1905 MHz, more shaders, and a TGP passing 450 Watts. This review ben...

Gigabyte RTX 3050 Gaming OC review
We analyze Gigabyte's new GeForce RTX 3050. In specific, the Gaming OC model has 8GB of memory, 2560 Shader processors, and a factory boost speed of 1822 MHz (1770 MHz reference)....

Radeon RX 6600 (Gigabyte Eagle 8G) review
Gigabyte's new Eagle is spreading its wings for the first time, meet the youngster called Gigabyte Radeon RX 6600 Eagle 8G. This is the non-XT version of NAVI23, still offering quite some performanc...

Gigabyte GeForce RTX 3070 Ti Gaming OC review
Let's review the Gigabyte GeForce RTX 3070 Ti Gaming OC GeForce RTX 3070 Ti. This card has been factory tweaked, has a custom-design PCB, components, a Windforce 3X cooler, and a trick or two more as...

© 2022