Undervolting the Yeston GeForce RTX 4070 Ti (Page 5)

Undervolting the Yeston GeForce RTX 4070 Ti

Graphics cards 1049 Page 5 of 20 Published 2023-07-26 09:59 by Krzysztof Hukalowicz

GPU Architecture

The maximum per GPU shader cluster (Nvidia SM) for the Geforce RTX 4000 is now 144, the theoretical maximum. As with Ampere, a cluster has 64 FP32 units and 64 FP32/INT32 units, four texture units, four tensor cores (Gen 4), a ray tracing core (Gen 3), and 128 KiB of L1 cache. It is how 18,432 FP32 shaders are assembled in one fully enabled ADA102 GPU. Half compute entirely in FP32, and the other half calculate in FP32 or INT32. The configuration of the units relative to one another is identical to that of the Ampere; Nvidia has not altered this quantity. The raster operations pipeline of 16 units per raster engine remains the same as Ampere. The Ada SM is equipped with 128 KB of Level 1 cache. Depending on the workload, this cache has a unified architecture that may be configured to operate as either an L1 data cache or shared memory. The complete AD102 GPU includes 18432 KB of L1 cache memory (compared to 10752 KB in GA102). Ada’s Level 2 cache has been substantially redesigned relative to Ampere. AD102 has 98304 KB of L2 cache, a 16-fold increase over GA102’s 6144 KB of L2 cache. All programs will benefit from the availability of such a vast cache memory pool, but sophisticated procedures such as ray tracing (especially path tracing) will gain the most.

A full AD103 GPU includes:

9728 CUDA Cores
76 RT Cores
304 Tensor Cores
304 Texture Units
112 ROPs

DLSS3

The NVIDIA Applied Deep Learning Research team has spent the past four years developing a frame generation technique that blends optical flow estimates with DLSS to enhance the gaming experience. Inserting synthesized frames between existing frames enhances the frame rate and delivers a more fluid gaming experience. Optical flow estimation is frequently used in computer vision applications to measure the direction and amplitude of pixels’ apparent motion between successfully generated graphics or video frames. In 3D graphics and video, typical use cases have included minimizing latency in augmented and virtual reality, enhancing the smoothness of video playback, improving video compression efficiency, and stabilizing video cameras. Typical deep-learning applications include automobile and robotic navigation, video analysis, and comprehension. Optical flow is comparable to the motion estimation component of video encoding, but its requirements for precision and consistency are significantly more demanding. As a result, many algorithms are employed. Since the Ampere GPU architecture, NVIDIA’s GPUs have supported an optical flow engine (OFA) that uses cutting-edge algorithms to produce high-quality outputs. Ada’s OFA unit delivers 300 TeraOPS (TOPS) visual flow work (over 2x quicker than the Ampere generation OFA) and supplies essential data to the DLSS 3 network. The Ada OFA unit and new motion vector analysis algorithms are crucial components that enable accurate and efficient frame production inside the new DLSS 3 technology architecture. This new DL-based frame generation algorithm increases frame rates by a factor of two compared to DLSS 2. When DLSS 3 is paired with the new RT Core and other Ada architecture improvements, Ada GPUs are up to four times quicker than their predecessors. DLSS 3 can also enhance performance when the CPU is the GPU’s performance barrier. Microsoft Flight Simulator is a typical example of a CPU-limited game because of its physics and enormous draw distances. It reduces the performance advantages of conventional super-resolution systems. In this instance, though, DLSS 3’s capacity to produce frames still delivers a performance boost of up to double.

Share this content

Twitter Facebook Reddit WhatsApp Email Print

Crucial Pro 32GB DDR5 - 5600 MHz CL46 review

Geekom Mini IT11 (i7-11390H) review