Guru3D.com
  • HOME
  • NEWS
    • Channels
    • Archive
  • DOWNLOADS
    • New Downloads
    • Categories
    • Archive
  • GAME REVIEWS
  • ARTICLES
    • Rig of the Month
    • Join ROTM
    • PC Buyers Guide
    • Guru3D VGA Charts
    • Editorials
    • Dated content
  • HARDWARE REVIEWS
    • Videocards
    • Processors
    • Audio
    • Motherboards
    • Memory and Flash
    • SSD Storage
    • Chassis
    • Media Players
    • Power Supply
    • Laptop and Mobile
    • Smartphone
    • Networking
    • Keyboard Mouse
    • Cooling
    • Search articles
    • Knowledgebase
    • More Categories
  • FORUMS
  • NEWSLETTER
  • CONTACT

New Reviews
G.Skill TridentZ 5 RGB 6800 MHz CL34 DDR5 review
Be Quiet! Dark Power 13 - 1000W PSU Review
Palit GeForce RTX 4080 GamingPRO OC review
Core i9 13900K DDR5 7200 MHz (+memory scaling) review
Seasonic Prime Titanium TX-1300 (1300W PSU) review
F1 2022: PC graphics performance benchmark review
MSI Clutch GM31 Lightweight​ (+Wireless) mice review
AMD Ryzen 9 7900 processor review
AMD Ryzen 7 7700 processor review
AMD Ryzen 5 7600 processor review

New Downloads
CPU-Z download v2.04
Intel ARC graphics Driver Download Version: 31.0.101.4090
AMD Radeon Software Adrenalin 23.1.2 (RX 7900) download
GeForce 528.24 WHQL driver download
Display Driver Uninstaller Download version 18.0.6.0
Download Intel network driver package 27.8
ReShade download v5.6.0
Media Player Classic - Home Cinema v2.0.0 Download
HWiNFO Download v7.36
MSI Afterburner 4.6.5 (Beta 4) Download


New Forum Topics
AMD Polaris (RX 400/500) users unable to play Forspoken Microsoft halts selling Windows 10 on January 31 Is unstable USB a known issue with x570 boards and is there a fix? Sony Bravia fix will be included in the driver after next. AMD Fluid Motion feature? Possible implementation of video card led control Download: CPUID releases version 1.96 of its popular CPU-Z AMD Ryzen 7000X3D is not overclockable, however will support for PBO 2 and the Curve Optimizer The Samsung Galaxy S23 is rumored to cost an additional 150 Euros (+specs) Intel Shares Fourth-Quarter and Full-Year 2022 Financial Results




Guru3D.com » Review » MSI GeForce RTX 4080 Suprim X review » Page 5

MSI GeForce RTX 4080 Suprim X review - GPU Architecture

by Hilbert Hagedoorn on: 11/15/2022 04:05 PM [ 4] 0 comment(s)

Tweet

Digging deeper

The maximum per GPU shader cluster (Nvidia SM) for the Geforce RTX 4000 is now 144, which is the theoretical maximum. . As with Ampere, a cluster has 64 FP32 units and 64 FP32/INT32 units, four texture units, four tensor cores (Gen 4), a ray tracing core (Gen 3), and 128 KiB of L1 cache. This is how 18,432 FP32 shaders are assembled in one fully enabled ADA102 GPU. Half of which compute entirely is FP32 and the other half calculate in either FP32 or INT32. The configuration of the units relative to one another is identical to that of the Ampere; Nvidia has not altered this quantity. The raster operations pipeline of 16 units per raster engine also remains the same seen from Ampere. The Ada SM is equipped with 128 KB of Level 1 cache. Depending on the workload, this cache has a unified architecture that may be configured to operate as either an L1 data cache or shared memory. The complete AD102 GPU includes 18432 KB of L1 cache memory (compared to 10752 KB in GA102). Ada's Level 2 cache has been substantially redesigned relative to Ampere. AD102 is equipped with 98304 KB of L2 cache, a 16-fold increase over GA102's 6144 KB of L2 cache. All programs will benefit from the availability of such a vast cache memory pool, but sophisticated procedures such as ray tracing (especially path tracing) will gain the most.

A full AD103 GPU includes:

  • 9728 CUDA Cores
  • 76 RT Cores
  • 304 Tensor Cores
  • 304 Texture Units
  • 112 ROPs
 

Geforce RTX 4000 block diagram by Ada Lovelace

Based on that alone the GeForce RTX 4080 is substantially quicker than the RTX 3080 Ti.  The RTX 4080 16GB has 9,728 shaders, and the RTX 4080 12GB has 7,680 shaders, but all announced variants have clock rates between 2.5 and 2.6 GHz boost.

NVIDIA GeForce RTX 4090

The new flagship GPU is the AD102-300, with 16384 CUDA cores and a boost clock of up to 2520 MHz and 23-power phases. This translates to a single-precision performance of 82.6 TFLOPS, 2.3x higher than its predecessor, the RTX 3090. The flagship Ada-based SKU will include 24GB of GDDR6X memory, with a peak bandwidth of 1 TB/s. This new card will require at least 100W more power than the 3090 Ti. NVIDIA confirms that this model will be available on October 12 for $1599.

 

NVIDIA GeForce RTX 4080 

The RTX 4080 16GB will have an AD103 GPU with 9728 CUDA cores, 16GB of GDDR6X memory clocked at 22.4 Gbps, and a 320W TDP. This model will be available in November for at least $1199. 

DLSS3

The NVIDIA Applied Deep Learning Research team has spent the past four years developing a frame generation technique that blends optical flow estimates with DLSS to enhance the gaming experience. The insertion of synthesized frames between existing frames enhances the frame rate and delivers a more fluid gaming experience. Optical flow estimation is frequently used in computer vision applications to measure the direction and amplitude of pixels' apparent motion between successively generated graphics frames or video frames. In the realms of 3D graphics and video, typical use cases have included minimizing latency in augmented and virtual reality, enhancing the smoothness of video playback, improving video compression efficiency, and stabilizing video cameras. Typical applications of deep learning include automobile and robotic navigation, video analysis and comprehension. Optical flow is comparable to the motion estimation component of video encoding, but its requirements for precision and consistency are significantly more demanding. As a result, many algorithms are employed. Since the Ampere GPU architecture, NVIDIA's GPUs have supported an optical flow engine (OFA) that employs cutting-edge algorithms to produce high-quality outputs. Ada's OFA unit delivers 300 TeraOPS (TOPS) of optical flow work (over 2x quicker than the Ampere generation OFA) and supplies essential data to the DLSS 3 network. The Ada OFA unit and new motion vector analysis algorithms are essential components that enable accurate and efficient frame production inside the new DLSS 3 technology architecture. This new DL-based frame generation algorithm increases frame rates by a factor of two in comparison to DLSS 2. When DLSS 3 is paired with the new RT Core and other Ada architecture improvements, Ada GPUs are up to four times quicker than their predecessors. DLSS 3 can also enhance performance when the CPU is the GPU's performance barrier. Microsoft Flight Simulator is a typical example of a CPU-limited game because of its physics and enormous draw distances. This reduces the performance advantages of conventional super-resolution systems. In this instance, though, DLSS 3's capacity to produce frames still delivers a performance boost of up to double.




32 pages « < 4 5 6 7 next »



Related Articles
MSI GeForce RTX 4070 Ti Gaming X TRIO review
MSI has stepped up with their Gaming X TRIO GeForce RTX 4070 Ti. It is factory-tweaked (slightly) but looks great. It comes with whisper-quiet cooling, and is quite impressive in all respects....

MSI GeForce RTX 4070 Ti SuprimX review
We turn towards MSI who submitted the mighty SuprimX edition of the GeForce RTX 4070 Ti. It;s has incredibly sweet and silent cooling, looks fantastic and is a notch factory tweaked to run faster also...

MSI GeForce RTX 4080 Gaming X TRIO review
We will review another card from the ADA Lovelace generation, the potent MSI GeForce RTX 4080 Gaming X Trio. It has been upgraded with more memory for the graphics card (16 GB), faster clocks and TGP,...

MSI GeForce RTX 4080 Suprim X review
Join us as we review the powerful MSI GeForce RTX 4080 Suprim X, which is another card from the ADA Lovelace generation. It's retrofitted with 16GB of graphics memory, increased TGP and luxurious coo...

© 2023